|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.avalon.framework.logger.AbstractLogEnabled org.apache.cocoon.xml.AbstractXMLProducer org.apache.cocoon.xml.AbstractXMLPipe org.apache.cocoon.transformation.AbstractTransformer org.apache.cocoon.transformation.LuceneIndexTransformer
public class LuceneIndexTransformer
A lucene index creation transformer.
This transformer reads a document with elements in the namespace
http://apache.org/cocoon/lucene/1.0
, and creates a new Lucene Index,
or updates an existing one.
It has several parameters which can be set in the sitemap component configuration or as parameters to the transformation step in the pipeline, or finally as attributes of the root element in the source XML document. The source document over-rides the transformation parameters, which in turn over-ride any configuration parameters.
Location of directory where index files are stored. This path is relative to the Cocoon work directory
This attribute controls whether the index is recreated.
If create = "false" and the index already exists then the index will be updated. Any documents which had already been indexed will be removed from the index and reinserted.
If the index does not exist then it will be created even if create
="false".
If create
="true" then any existing index will be destroyed and a new index created.
If you are rebuilding your entire index then you should set create
="true" because the
indexer doesn't need to remove old documents from the index, so it will be faster.
Maximum number of terms to index in a field (as far as the index is concerned, the document will effectively be truncated at this point. The default value, 10k, may not be sufficient for large documents.
Class name of the Lucene text analyzer to use. Typically depends on the language of the text being indexed. See the Lucene documentation for more information.
Determines how often segment indices are merged. See the Lucene documentation for more information.
Determines how often the lucene index will be optimized. When you have 1000's of documents, optimizing the index can become quite slow (eg. 7 seconds for 9000 small docs, P4).
<?xml version="1.0" encoding="UTF-8"?> <lucene:index xmlns:lucene="http://apache.org/cocoon/lucene/1.0" merge-factor="20" create="false" directory="index" max-field-length="10000" optimize-frequency="1" analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer"> <lucene:document url="a.html"> <documentTitle lucene:store="true">Doggerel</documentTitle> <body>The quick brown fox jumped over the lazy dog</body> </lucene:document> <lucene:document url="b.html"> <documentTitle lucene:store="true">Lorem Ipsum</documentTitle> <body>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</body> <body>Nunc a mauris blandit ligula scelerisque tristique.</body> </lucene:document> </lucene:index>
Fields inherited from class org.apache.cocoon.xml.AbstractXMLProducer |
---|
contentHandler, EMPTY_CONTENT_HANDLER, lexicalHandler, xmlConsumer |
Fields inherited from interface org.apache.cocoon.transformation.Transformer |
---|
ROLE |
Constructor Summary | |
---|---|
LuceneIndexTransformer()
|
Method Summary | |
---|---|
void |
characters(char[] ch,
int start,
int length)
Receive notification of character data. |
void |
configure(Configuration conf)
Configure the transformer. |
void |
contextualize(Context context)
Contextualize this class |
void |
endDocument()
Receive notification of the end of a document. |
void |
endElement(String namespaceURI,
String localName,
String qName)
Receive notification of the end of an element. |
void |
endPrefixMapping(String prefix)
End the scope of a prefix-URI mapping. |
String |
getAnalyzer()
|
String |
getDirectory()
|
Serializable |
getKey()
Generate the unique key. |
int |
getMaxFieldLength()
|
int |
getMergeFactor()
|
int |
getOptimizeFrequency()
|
SourceValidity |
getValidity()
Generate the validity object. |
void |
recycle()
Recycle the producer by removing references, and resetting handlers to null (empty) implementations. |
void |
setAnalyzer(String analyzer)
|
void |
setDirectory(String directory)
|
void |
setMaxFieldLength(int maxFieldLength)
|
void |
setMergeFactor(int mergeFactor)
|
void |
setOptimizeFrequency(int optimizeFrequency)
|
void |
setup(SourceResolver resolver,
Map objectModel,
String src,
Parameters parameters)
Setup the transformer. |
void |
startDocument()
Receive notification of the beginning of a document. |
void |
startElement(String namespaceURI,
String localName,
String qName,
Attributes atts)
Receive notification of the beginning of an element. |
void |
startPrefixMapping(String prefix,
String uri)
Begin the scope of a prefix-URI Namespace mapping. |
Methods inherited from class org.apache.cocoon.xml.AbstractXMLPipe |
---|
comment, endCDATA, endDTD, endEntity, ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity, startCDATA, startDTD, startEntity |
Methods inherited from class org.apache.cocoon.xml.AbstractXMLProducer |
---|
setConsumer, setContentHandler, setLexicalHandler |
Methods inherited from class org.apache.avalon.framework.logger.AbstractLogEnabled |
---|
enableLogging, getLogger, setupLogger, setupLogger, setupLogger |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.xml.sax.ContentHandler |
---|
ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity |
Methods inherited from interface org.xml.sax.ext.LexicalHandler |
---|
comment, endCDATA, endDTD, endEntity, startCDATA, startDTD, startEntity |
Methods inherited from interface org.apache.cocoon.xml.XMLProducer |
---|
setConsumer |
Field Detail |
---|
public static final String ANALYZER_CLASSNAME_CONFIG
public static final String ANALYZER_CLASSNAME_PARAMETER
public static final String ANALYZER_CLASSNAME_DEFAULT
public static final String DIRECTORY_CONFIG
public static final String DIRECTORY_PARAMETER
public static final String DIRECTORY_DEFAULT
public static final String MERGE_FACTOR_CONFIG
public static final String MERGE_FACTOR_PARAMETER
public static final int MERGE_FACTOR_DEFAULT
public static final String OPTIMIZE_FREQUENCY_CONFIG
public static final String OPTIMIZE_FREQUENCY_PARAMETER
public static final int OPTIMIZE_FREQUENCY_DEFAULT
public static final String MAX_FIELD_LENGTH_CONFIG
public static final String MAX_FIELD_LENGTH_PARAMETER
public static final int MAX_FIELD_LENGTH_DEFAULT
public static final String LUCENE_URI
public static final String LUCENE_QUERY_ELEMENT
public static final String LUCENE_QUERY_ANALYZER_ATTRIBUTE
public static final String LUCENE_QUERY_DIRECTORY_ATTRIBUTE
public static final String LUCENE_QUERY_CREATE_ATTRIBUTE
public static final String LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE
public static final String LUCENE_QUERY_MAX_FIELD_LENGTH_ATTRIBUTE
public static final String LUCENE_QUERY_OPTIMIZE_FREQUENCY_CONFIG_ATTRIBUTE
public static final String LUCENE_DOCUMENT_ELEMENT
public static final String LUCENE_DOCUMENT_URL_ATTRIBUTE
public static final String LUCENE_ELEMENT_ATTR_TO_TEXT_ATTRIBUTE
public static final String LUCENE_ELEMENT_ATTR_STORE_VALUE
public static final String LUCENE_ELAPSED_TIME_ATTRIBUTE
public static final String CDATA
protected File workDir
Constructor Detail |
---|
public LuceneIndexTransformer()
Method Detail |
---|
public void configure(Configuration conf) throws ConfigurationException
configure
in interface Configurable
ConfigurationException
public void setup(SourceResolver resolver, Map objectModel, String src, Parameters parameters) throws ProcessingException, SAXException, IOException
<map:transform>
element in the sitemap. These
parameters are optional: If no parameters are specified here then the
defaults are supplied by the component configuration. Any parameters
specified here may be over-ridden by attributes of the lucene:index
element in the input document.
setup
in interface SitemapModelComponent
ProcessingException
SAXException
IOException
public void contextualize(Context context) throws ContextException
contextualize
in interface Contextualizable
ContextException
public void recycle()
AbstractXMLProducer
recycle
in interface Recyclable
recycle
in class AbstractXMLProducer
AbstractXMLProducer.recycle()
public Serializable getKey()
getKey
in interface CacheableProcessingComponent
public SourceValidity getValidity()
getValidity
in interface CacheableProcessingComponent
null
if the
component is currently not cacheable.public void startDocument() throws SAXException
AbstractXMLPipe
startDocument
in interface ContentHandler
startDocument
in class AbstractXMLPipe
SAXException
public void endDocument() throws SAXException
AbstractXMLPipe
endDocument
in interface ContentHandler
endDocument
in class AbstractXMLPipe
SAXException
public void startPrefixMapping(String prefix, String uri) throws SAXException
startPrefixMapping
in interface ContentHandler
startPrefixMapping
in class AbstractXMLPipe
prefix
- The Namespace prefix being declared.uri
- The Namespace URI the prefix is mapped to.
SAXException
public void endPrefixMapping(String prefix) throws SAXException
endPrefixMapping
in interface ContentHandler
endPrefixMapping
in class AbstractXMLPipe
prefix
- The prefix that was being mapping.
SAXException
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException
AbstractXMLPipe
startElement
in interface ContentHandler
startElement
in class AbstractXMLPipe
namespaceURI
- The Namespace URI, or the empty string if the element has no
Namespace URI or if Namespace
processing is not being performed.localName
- The local name (without prefix), or the empty string if
Namespace processing is not being performed.qName
- The raw XML 1.0 name (with prefix), or the empty string if
raw names are not available.atts
- The attributes attached to the element. If there are no
attributes, it shall be an empty Attributes object.
SAXException
public void endElement(String namespaceURI, String localName, String qName) throws SAXException
AbstractXMLPipe
endElement
in interface ContentHandler
endElement
in class AbstractXMLPipe
namespaceURI
- The Namespace URI, or the empty string if the element has no
Namespace URI or if Namespace
processing is not being performed.localName
- The local name (without prefix), or the empty string if
Namespace processing is not being performed.qName
- The raw XML 1.0 name (with prefix), or the empty string if
raw names are not available.
SAXException
public void characters(char[] ch, int start, int length) throws SAXException
AbstractXMLPipe
characters
in interface ContentHandler
characters
in class AbstractXMLPipe
ch
- The characters from the XML document.start
- The start position in the array.length
- The number of characters to read from the array.
SAXException
public String getAnalyzer()
public void setAnalyzer(String analyzer)
analyzer
- the analyzer to setpublic String getDirectory()
public void setDirectory(String directory)
directory
- the directory to setpublic int getMergeFactor()
public void setMergeFactor(int mergeFactor)
mergeFactor
- the mergeFactor to setpublic int getMaxFieldLength()
public void setMaxFieldLength(int maxFieldLength)
maxFieldLength
- the maxFieldLength to setpublic int getOptimizeFrequency()
public void setOptimizeFrequency(int optimizeFrequency)
optimizeFrequency
- the optimizeFrequency to set
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |