An Analyzer is used to analyze text.

It thus represents a policy for extracting index terms from text.

Note: Lucene Java implementation is oriented to streams. It provides effective work with a huge documents (more then 20Mb). But engine itself is not oriented such documents. Thus Zend_Search_Lucene analysis API works with data strings and sets (arrays).

category Zend
package Zend_Search_Lucene
subpackage Analysis
copyright Copyright (c) 2005-2015 Zend Technologies USA Inc. (http://www.zend.com)
license New BSD License

 Methods

Return the default Analyzer implementation used by indexing code.

getDefault() : \Zend_Search_Lucene_Analysis_Analyzer
Static

Returns

\Zend_Search_Lucene_Analysis_Analyzer

Tokenization stream API Get next token Returns null at the end of stream

nextToken() : \Zend_Search_Lucene_Analysis_Token | null

Tokens are returned in UTF-8 (internal Zend_Search_Lucene encoding)

Returns

\Zend_Search_Lucene_Analysis_Tokennull

Reset token stream

reset() 

Set the default Analyzer implementation used by indexing code.

setDefault(\Zend_Search_Lucene_Analysis_Analyzer $analyzer) 
Static

Parameters

$analyzer

Tokenization stream API Set input

setInput(string $data, $encoding = ''

Parameters

$data

string

$encoding

Tokenize text to a terms Returns array of Zend_Search_Lucene_Analysis_Token objects

tokenize(string $data, $encoding = '') : array

Tokens are returned in UTF-8 (internal Zend_Search_Lucene encoding)

Parameters

$data

string

$encoding

Returns

array

 Properties

 

Input string encoding

$_encoding : string

Default

''
 

Input string

$_input : string

Default

null
 

The Analyzer implementation used by default.

$_defaultImpl : \Zend_Search_Lucene_Analysis_Analyzer

Default

Static