Zend FR

m4r14ch1 · 05-04-2012 15:36:10

Bonjour tous le monde.
J'ai intégrer Zend_Search_Lucene dans mon projet et ça marche très bien.

Mon problème est au niveau du recherche j'explique.

J'ai indexer le mot "Mathématique" encodé en utf8, il est stocké dans mon index par "MathÃ©matique".

Je fais une recherche par "Mathématique" ça marche bien mais par "Mathematique" ça marche pas.

Ce que je veux c'est pouvoir chercher avec "Mathématique", "Mathematique", "Mathe", "Mathé", "ématique" et "ematique"

J'ai utilisé les jockers et fuzzy mais il me retourne une erreur me disant que je ne peux pas utilisé les 2 à la fois.

Mon code

Code:

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding("UTF-8");
Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
 
$index = Zend_Search_Lucene::open( APPLICATION_PATH . '/data/index' );
 
$q = $this->_getParam('wath', '');
$query = Zend_Search_Lucene_Search_QueryParser::parse('*'.$q.'*~0.8');
 
$results = $index->find($query);
 
foreach ($results as $value)
{
        echo $value->branche . '<br />';
}

lebilien · 05-04-2012 15:57:21

tu veut pas utiliser les metaphone ?

http://php.net/manual/fr/function.metaphone.php

m4r14ch1 · 05-04-2012 16:01:37

Merci pour votre réponce trés rapide.

Ou dois-je utilisé cette metaphone dans la création des index ou la recherche.

lebilien · 06-04-2012 09:35:37

Moi pour les recherche j'utilise les un champs metaphone et après j'utilise leventshein mais on s'eloigne de lucene.

m4r14ch1 · 06-04-2012 17:02:06

Voila la solution que j'ai réaliser, je ne c'est pas si c'est la bonne mais elle ma corrigé le problème.

J'ai crée un Analyzer qui me supprime tous les accents et rond la chaine en miniscule.
Je l'ai crée a partir de Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8

Code:

class Application_Analyzer_Accent extends Zend_Search_Lucene_Analysis_Analyzer_Common
{
    /**
     * Current char position in an UTF-8 stream
     *
     * @var integer
     */
    private $_position;

    /**
     * Current binary position in an UTF-8 stream
     *
     * @var integer
     */
    private $_bytePosition;

    /**
     * Object constructor
     *
     * @throws Zend_Search_Lucene_Exception
     */
    public function __construct()
    {
        if (@preg_match('/\pL/u', 'a') != 1) {
            // PCRE unicode support is turned off
            require_once 'Zend/Search/Lucene/Exception.php';
            throw new Zend_Search_Lucene_Exception('Utf8 analyzer needs PCRE unicode support to be enabled.');
        }
    }

    /**
     * Reset token stream
     */
    public function reset()
    {
        $this->_position     = 0;
        $this->_bytePosition = 0;

        // convert input into UTF-8
        if (strcasecmp($this->_encoding, 'utf8' ) != 0  &&
            strcasecmp($this->_encoding, 'utf-8') != 0 ) {
                $this->_input = iconv($this->_encoding, 'UTF-8', $this->_input);
                $this->_encoding = 'UTF-8';
        }
    }

    /**
     * Tokenization stream API
     * Get next token
     * Returns null at the end of stream
     *
     * @return Zend_Search_Lucene_Analysis_Token|null
     */
    public function nextToken()
    {
        if ($this->_input === null) {
            return null;
        }
        
        //$match = $this->deleteAccent($match);
        $this->_input = $this->deleteAccent($this->_input);
        $this->_input = strtolower($this->_input);
         
        do {
            if (! preg_match('/[\p{L}]+/u', $this->_input, $match, PREG_OFFSET_CAPTURE, $this->_bytePosition)) {
                // It covers both cases a) there are no matches (preg_match(...) === 0)
                // b) error occured (preg_match(...) === FALSE)
                return null;
            }
            
            // matched string
            $matchedWord = $match[0][0];

            // binary position of the matched word in the input stream
            $binStartPos = $match[0][1];

            // character position of the matched word in the input stream
            $startPos = $this->_position +
                        iconv_strlen(substr($this->_input,
                                            $this->_bytePosition,
                                            $binStartPos - $this->_bytePosition),
                                     'UTF-8');
            // character postion of the end of matched word in the input stream
            $endPos = $startPos + iconv_strlen($matchedWord, 'UTF-8');

            $this->_bytePosition = $binStartPos + strlen($matchedWord);
            $this->_position     = $endPos;

            $token = $this->normalize(new Zend_Search_Lucene_Analysis_Token($matchedWord, $startPos, $endPos));
        } while ($token === null); // try again if token is skipped
        //print_r($token);
        return $token;
    }
    
    private function deleteAccent($str)
    {
        return iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $str);
    }
}

je crois que iconv() donne des résultat différent sur chaque OS.
Mon exemple est exécuté sous Linux.

L'action qui crée les index

Code:

setlocale(LC_CTYPE, 'fr_FR.utf8');
        
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Application_Analyzer_Accent());

if (is_file(APPLICATION_PATH . '/data/index')) 
{ 
    $index = Zend_Search_Lucene::open(APPLICATION_PATH . '/data/index');  
} 
else 
{ 
    $index = Zend_Search_Lucene::create(APPLICATION_PATH . '/data/index');      
} 

$Branches = new Application_Model_Branches(array('ecoleId' => $this->ecole_id));
$results = $Branches->fetchAll();   
    
foreach ($results as $result) 
{
    $doc = new Zend_Search_Lucene_Document();
        
    $doc->addField(Zend_Search_Lucene_Field::Text('branche', $result->branche, 'utf-8'));
        
    $doc->addField(Zend_Search_Lucene_Field::Text('ar_branche', $result->arBranche, 'utf-8'));
        
    $index->addDocument($doc); 
}

$index->commit(); 
$index->optimize();

L'action de recherche

Code:

setlocale(LC_CTYPE, 'fr_FR.utf8');
Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Application_Analyzer_Accent());

$index = Zend_Search_Lucene::open( APPLICATION_PATH . '/data/index' );

$q = $this->_getParam('wath', '');
$results = $index->find('*"'.$q.'"*');

foreach ($results as $value)
{
    echo $value->branche . '<br />';
}

Zend FR

#1 05-04-2012 15:36:10

Zend_Search_Lucene recherche avec utf8

Code:

#2 05-04-2012 15:57:21

Re: Zend_Search_Lucene recherche avec utf8

#3 05-04-2012 16:01:37

Re: Zend_Search_Lucene recherche avec utf8

#4 06-04-2012 09:35:37

Re: Zend_Search_Lucene recherche avec utf8

#5 06-04-2012 17:02:06

Re: Zend_Search_Lucene recherche avec utf8

Code:

Code:

Code:

Pied de page des forums