org.apache.cocoon.transformation
Class LuceneIndexTransformer

java.lang.Object
  extended by org.apache.avalon.framework.logger.AbstractLogEnabled
      extended by org.apache.cocoon.xml.AbstractXMLProducer
          extended by org.apache.cocoon.xml.AbstractXMLPipe
              extended by org.apache.cocoon.transformation.AbstractTransformer
                  extended by org.apache.cocoon.transformation.LuceneIndexTransformer
All Implemented Interfaces:
Poolable, Recyclable, Component, Configurable, Contextualizable, LogEnabled, CacheableProcessingComponent, SitemapModelComponent, Transformer, XMLPipe, XMLProducer, XMLConsumer, ContentHandler, LexicalHandler

public class LuceneIndexTransformer
extends AbstractTransformer
implements CacheableProcessingComponent, Configurable, Contextualizable

A lucene index creation transformer.

This transformer reads a document with elements in the namespace http://apache.org/cocoon/lucene/1.0, and creates a new Lucene Index, or updates an existing one.

It has several parameters which can be set in the sitemap component configuration or as parameters to the transformation step in the pipeline, or finally as attributes of the root element in the source XML document. The source document over-rides the transformation parameters, which in turn over-ride any configuration parameters.

directory

Location of directory where index files are stored. This path is relative to the Cocoon work directory

create

This attribute controls whether the index is recreated.

max-field-length

Maximum number of terms to index in a field (as far as the index is concerned, the document will effectively be truncated at this point. The default value, 10k, may not be sufficient for large documents.

analyzer

Class name of the Lucene text analyzer to use. Typically depends on the language of the text being indexed. See the Lucene documentation for more information.

merge-factor

Determines how often segment indices are merged. See the Lucene documentation for more information.

optimize-frequency

Determines how often the lucene index will be optimized. When you have 1000's of documents, optimizing the index can become quite slow (eg. 7 seconds for 9000 small docs, P4).

A simple example of the input:
<?xml version="1.0" encoding="UTF-8"?>
 <lucene:index xmlns:lucene="http://apache.org/cocoon/lucene/1.0" 
     merge-factor="20" 
     create="false" 
     directory="index" 
     max-field-length="10000"
     optimize-frequency="1" 
     analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer">
     <lucene:document url="a.html">
             <documentTitle lucene:store="true">Doggerel</documentTitle>
             <body>The quick brown fox jumped over the lazy dog</body>    
     </lucene:document>
     <lucene:document url="b.html">
             <documentTitle lucene:store="true">Lorem Ipsum</documentTitle>
             <body>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</body>
             <body>Nunc a mauris blandit ligula scelerisque tristique.</body>    
     </lucene:document>
 </lucene:index>
 

Version:
$Id: LuceneIndexTransformer.html 1304258 2012-03-23 10:09:27Z ilgrosso $

Field Summary
static String ANALYZER_CLASSNAME_CONFIG
           
static String ANALYZER_CLASSNAME_DEFAULT
           
static String ANALYZER_CLASSNAME_PARAMETER
           
static String CDATA
           
static String DIRECTORY_CONFIG
           
static String DIRECTORY_DEFAULT
           
static String DIRECTORY_PARAMETER
           
static String LUCENE_DOCUMENT_ELEMENT
           
static String LUCENE_DOCUMENT_URL_ATTRIBUTE
           
static String LUCENE_ELAPSED_TIME_ATTRIBUTE
           
static String LUCENE_ELEMENT_ATTR_STORE_VALUE
           
static String LUCENE_ELEMENT_ATTR_TO_TEXT_ATTRIBUTE
           
static String LUCENE_QUERY_ANALYZER_ATTRIBUTE
           
static String LUCENE_QUERY_CREATE_ATTRIBUTE
           
static String LUCENE_QUERY_DIRECTORY_ATTRIBUTE
           
static String LUCENE_QUERY_ELEMENT
           
static String LUCENE_QUERY_MAX_FIELD_LENGTH_ATTRIBUTE
           
static String LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE
           
static String LUCENE_QUERY_OPTIMIZE_FREQUENCY_CONFIG_ATTRIBUTE
           
static String LUCENE_URI
           
static String MAX_FIELD_LENGTH_CONFIG
           
static int MAX_FIELD_LENGTH_DEFAULT
           
static String MAX_FIELD_LENGTH_PARAMETER
           
static String MERGE_FACTOR_CONFIG
           
static int MERGE_FACTOR_DEFAULT
           
static String MERGE_FACTOR_PARAMETER
           
static String OPTIMIZE_FREQUENCY_CONFIG
           
static int OPTIMIZE_FREQUENCY_DEFAULT
           
static String OPTIMIZE_FREQUENCY_PARAMETER
           
protected  File workDir
           
 
Fields inherited from class org.apache.cocoon.xml.AbstractXMLProducer
contentHandler, EMPTY_CONTENT_HANDLER, lexicalHandler, xmlConsumer
 
Fields inherited from interface org.apache.cocoon.transformation.Transformer
ROLE
 
Constructor Summary
LuceneIndexTransformer()
           
 
Method Summary
 void characters(char[] ch, int start, int length)
          Receive notification of character data.
 void configure(Configuration conf)
          Configure the transformer.
 void contextualize(Context context)
          Contextualize this class
 void endDocument()
          Receive notification of the end of a document.
 void endElement(String namespaceURI, String localName, String qName)
          Receive notification of the end of an element.
 void endPrefixMapping(String prefix)
          End the scope of a prefix-URI mapping.
 String getAnalyzer()
           
 String getDirectory()
           
 Serializable getKey()
          Generate the unique key.
 int getMaxFieldLength()
           
 int getMergeFactor()
           
 int getOptimizeFrequency()
           
 SourceValidity getValidity()
          Generate the validity object.
 void recycle()
          Recycle the producer by removing references, and resetting handlers to null (empty) implementations.
 void setAnalyzer(String analyzer)
           
 void setDirectory(String directory)
           
 void setMaxFieldLength(int maxFieldLength)
           
 void setMergeFactor(int mergeFactor)
           
 void setOptimizeFrequency(int optimizeFrequency)
           
 void setup(SourceResolver resolver, Map objectModel, String src, Parameters parameters)
          Setup the transformer.
 void startDocument()
          Receive notification of the beginning of a document.
 void startElement(String namespaceURI, String localName, String qName, Attributes atts)
          Receive notification of the beginning of an element.
 void startPrefixMapping(String prefix, String uri)
          Begin the scope of a prefix-URI Namespace mapping.
 
Methods inherited from class org.apache.cocoon.xml.AbstractXMLPipe
comment, endCDATA, endDTD, endEntity, ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity, startCDATA, startDTD, startEntity
 
Methods inherited from class org.apache.cocoon.xml.AbstractXMLProducer
setConsumer, setContentHandler, setLexicalHandler
 
Methods inherited from class org.apache.avalon.framework.logger.AbstractLogEnabled
enableLogging, getLogger, setupLogger, setupLogger, setupLogger
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.xml.sax.ContentHandler
ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity
 
Methods inherited from interface org.xml.sax.ext.LexicalHandler
comment, endCDATA, endDTD, endEntity, startCDATA, startDTD, startEntity
 
Methods inherited from interface org.apache.cocoon.xml.XMLProducer
setConsumer
 

Field Detail

ANALYZER_CLASSNAME_CONFIG

public static final String ANALYZER_CLASSNAME_CONFIG
See Also:
Constant Field Values

ANALYZER_CLASSNAME_PARAMETER

public static final String ANALYZER_CLASSNAME_PARAMETER
See Also:
Constant Field Values

ANALYZER_CLASSNAME_DEFAULT

public static final String ANALYZER_CLASSNAME_DEFAULT
See Also:
Constant Field Values

DIRECTORY_CONFIG

public static final String DIRECTORY_CONFIG
See Also:
Constant Field Values

DIRECTORY_PARAMETER

public static final String DIRECTORY_PARAMETER
See Also:
Constant Field Values

DIRECTORY_DEFAULT

public static final String DIRECTORY_DEFAULT
See Also:
Constant Field Values

MERGE_FACTOR_CONFIG

public static final String MERGE_FACTOR_CONFIG
See Also:
Constant Field Values

MERGE_FACTOR_PARAMETER

public static final String MERGE_FACTOR_PARAMETER
See Also:
Constant Field Values

MERGE_FACTOR_DEFAULT

public static final int MERGE_FACTOR_DEFAULT
See Also:
Constant Field Values

OPTIMIZE_FREQUENCY_CONFIG

public static final String OPTIMIZE_FREQUENCY_CONFIG
See Also:
Constant Field Values

OPTIMIZE_FREQUENCY_PARAMETER

public static final String OPTIMIZE_FREQUENCY_PARAMETER
See Also:
Constant Field Values

OPTIMIZE_FREQUENCY_DEFAULT

public static final int OPTIMIZE_FREQUENCY_DEFAULT
See Also:
Constant Field Values

MAX_FIELD_LENGTH_CONFIG

public static final String MAX_FIELD_LENGTH_CONFIG
See Also:
Constant Field Values

MAX_FIELD_LENGTH_PARAMETER

public static final String MAX_FIELD_LENGTH_PARAMETER
See Also:
Constant Field Values

MAX_FIELD_LENGTH_DEFAULT

public static final int MAX_FIELD_LENGTH_DEFAULT

LUCENE_URI

public static final String LUCENE_URI
See Also:
Constant Field Values

LUCENE_QUERY_ELEMENT

public static final String LUCENE_QUERY_ELEMENT
See Also:
Constant Field Values

LUCENE_QUERY_ANALYZER_ATTRIBUTE

public static final String LUCENE_QUERY_ANALYZER_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_QUERY_DIRECTORY_ATTRIBUTE

public static final String LUCENE_QUERY_DIRECTORY_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_QUERY_CREATE_ATTRIBUTE

public static final String LUCENE_QUERY_CREATE_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE

public static final String LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_QUERY_MAX_FIELD_LENGTH_ATTRIBUTE

public static final String LUCENE_QUERY_MAX_FIELD_LENGTH_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_QUERY_OPTIMIZE_FREQUENCY_CONFIG_ATTRIBUTE

public static final String LUCENE_QUERY_OPTIMIZE_FREQUENCY_CONFIG_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_DOCUMENT_ELEMENT

public static final String LUCENE_DOCUMENT_ELEMENT
See Also:
Constant Field Values

LUCENE_DOCUMENT_URL_ATTRIBUTE

public static final String LUCENE_DOCUMENT_URL_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_ELEMENT_ATTR_TO_TEXT_ATTRIBUTE

public static final String LUCENE_ELEMENT_ATTR_TO_TEXT_ATTRIBUTE
See Also:
Constant Field Values

LUCENE_ELEMENT_ATTR_STORE_VALUE

public static final String LUCENE_ELEMENT_ATTR_STORE_VALUE
See Also:
Constant Field Values

LUCENE_ELAPSED_TIME_ATTRIBUTE

public static final String LUCENE_ELAPSED_TIME_ATTRIBUTE
See Also:
Constant Field Values

CDATA

public static final String CDATA
See Also:
Constant Field Values

workDir

protected File workDir
Constructor Detail

LuceneIndexTransformer

public LuceneIndexTransformer()
Method Detail

configure

public void configure(Configuration conf)
               throws ConfigurationException
Configure the transformer. The configuration parameters are stored as general defaults, which may be over-ridden by parameters specified as parameters in the sitemap pipeline, or by attributes of the query element(s) in the XML input document.

Specified by:
configure in interface Configurable
Throws:
ConfigurationException

setup

public void setup(SourceResolver resolver,
                  Map objectModel,
                  String src,
                  Parameters parameters)
           throws ProcessingException,
                  SAXException,
                  IOException
Setup the transformer. Called when the pipeline is assembled. The parameters are those specified as child elements of the <map:transform> element in the sitemap. These parameters are optional: If no parameters are specified here then the defaults are supplied by the component configuration. Any parameters specified here may be over-ridden by attributes of the lucene:index element in the input document.

Specified by:
setup in interface SitemapModelComponent
Throws:
ProcessingException
SAXException
IOException

contextualize

public void contextualize(Context context)
                   throws ContextException
Contextualize this class

Specified by:
contextualize in interface Contextualizable
Throws:
ContextException

recycle

public void recycle()
Description copied from class: AbstractXMLProducer
Recycle the producer by removing references, and resetting handlers to null (empty) implementations.

Specified by:
recycle in interface Recyclable
Overrides:
recycle in class AbstractXMLProducer
See Also:
AbstractXMLProducer.recycle()

getKey

public Serializable getKey()
Generate the unique key. This key must be unique inside the space of this component.

Specified by:
getKey in interface CacheableProcessingComponent
Returns:
The generated key

getValidity

public SourceValidity getValidity()
Generate the validity object.

Specified by:
getValidity in interface CacheableProcessingComponent
Returns:
The generated validity object or null if the component is currently not cacheable.

startDocument

public void startDocument()
                   throws SAXException
Description copied from class: AbstractXMLPipe
Receive notification of the beginning of a document.

Specified by:
startDocument in interface ContentHandler
Overrides:
startDocument in class AbstractXMLPipe
Throws:
SAXException

endDocument

public void endDocument()
                 throws SAXException
Description copied from class: AbstractXMLPipe
Receive notification of the end of a document.

Specified by:
endDocument in interface ContentHandler
Overrides:
endDocument in class AbstractXMLPipe
Throws:
SAXException

startPrefixMapping

public void startPrefixMapping(String prefix,
                               String uri)
                        throws SAXException
Begin the scope of a prefix-URI Namespace mapping.

Specified by:
startPrefixMapping in interface ContentHandler
Overrides:
startPrefixMapping in class AbstractXMLPipe
Parameters:
prefix - The Namespace prefix being declared.
uri - The Namespace URI the prefix is mapped to.
Throws:
SAXException

endPrefixMapping

public void endPrefixMapping(String prefix)
                      throws SAXException
End the scope of a prefix-URI mapping.

Specified by:
endPrefixMapping in interface ContentHandler
Overrides:
endPrefixMapping in class AbstractXMLPipe
Parameters:
prefix - The prefix that was being mapping.
Throws:
SAXException

startElement

public void startElement(String namespaceURI,
                         String localName,
                         String qName,
                         Attributes atts)
                  throws SAXException
Description copied from class: AbstractXMLPipe
Receive notification of the beginning of an element.

Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class AbstractXMLPipe
Parameters:
namespaceURI - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qName - The raw XML 1.0 name (with prefix), or the empty string if raw names are not available.
atts - The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object.
Throws:
SAXException

endElement

public void endElement(String namespaceURI,
                       String localName,
                       String qName)
                throws SAXException
Description copied from class: AbstractXMLPipe
Receive notification of the end of an element.

Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class AbstractXMLPipe
Parameters:
namespaceURI - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qName - The raw XML 1.0 name (with prefix), or the empty string if raw names are not available.
Throws:
SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
Description copied from class: AbstractXMLPipe
Receive notification of character data.

Specified by:
characters in interface ContentHandler
Overrides:
characters in class AbstractXMLPipe
Parameters:
ch - The characters from the XML document.
start - The start position in the array.
length - The number of characters to read from the array.
Throws:
SAXException

getAnalyzer

public String getAnalyzer()
Returns:
the analyzer

setAnalyzer

public void setAnalyzer(String analyzer)
Parameters:
analyzer - the analyzer to set

getDirectory

public String getDirectory()
Returns:
the directory

setDirectory

public void setDirectory(String directory)
Parameters:
directory - the directory to set

getMergeFactor

public int getMergeFactor()
Returns:
the mergeFactor

setMergeFactor

public void setMergeFactor(int mergeFactor)
Parameters:
mergeFactor - the mergeFactor to set

getMaxFieldLength

public int getMaxFieldLength()
Returns:
the maxFieldLength

setMaxFieldLength

public void setMaxFieldLength(int maxFieldLength)
Parameters:
maxFieldLength - the maxFieldLength to set

getOptimizeFrequency

public int getOptimizeFrequency()
Returns:
the optimizeFrequency

setOptimizeFrequency

public void setOptimizeFrequency(int optimizeFrequency)
Parameters:
optimizeFrequency - the optimizeFrequency to set


Copyright © 1999-2010 The Apache Software Foundation. All Rights Reserved.