Goal
This document explains the basic caching algorithm of Apache Cocoon.
Overview
The caching algorithm of Cocoon has a very flexible and powerful design.
The algorithms and components used are not hardcoded into the core of
Cocoon. They can be configured using Avalon components.
This document describes the components available for caching,
how they can be configured and how to implement your own cacheable components.
Caching of event pipelines
The algorithm used for caching depends on the event pipeline configured.
For more information about configuration see the chapter below.
The following subchapters describe the available caching algorithms.
The CachingEventPipeline
The CachingEventPipelineuses a very easy but effective approach
to cache the event pipelines of a request: The pipeline process
is cached up to the most possible point.
Each sitemap component (generator or transformer) which might be
cacheable must implement the Cacheable interface. When the
event pipeline is processed each sitemap component starting with
the generator is asked if it implements this interface. This
test stops either when the first component does not implement
the Cacheable interface or when the first cacheable component is
currently not cacheable for any reasons (more about this in a moment).
The Cacheable interface declares a method generateKey()
which must produce a unique key for this sitemap component inside
the component space. For example the FileGenerator generates a hash
of the source argument (the xml document read). All parameters/values
which are used for the processing of the request by the generator must
be used for this key. If, e.g. the request parameters are used by
the component, it must build a key with respect to the current request
parameters.
If for any reason the sitemap component detects that the current request
is not cacheable it can simply return 0 as the key. This has
the same effect as not declaring the Cacheable interface.
Now after the key is build for this particular request, it is looked up
in the cache if it exists. If not, the new request is generated and cached
for further requests.
If a cached response is found for the key, the caching algorithm checks
if this response is still valid. For this check each cacheable component
returns a validity object when the method generateValidity
is invoked. (If a cacheable component returns null it
is temporarily not cacheable, like returning 0 for the key.)
A CacheValidity object contains all information the component
needs to verify if the cached content is still valid. For example the
file generator stores the last modification date of the xml document parsed
in the validity object.
When a response is cached all validity objects are stored together with
the cached response in the cache. Actually the CachedEventObject
is stored which encapsulates all this information.
When a new response is generated and the key is build, the caching
algorithm also collects all uptodate cache validity objects. So if the
cached response is found in the cache these validity objects are compared.
If they are valid (or equal) the cached response is used and feed into
the pipeline. If they are not valid any more the cached response is removed
from the cache, the new response is generated and then stored together with
the new validity objects in the cache.
Examples
If you have the following pipeline:
Generator[type=file|src=a.xml] -> Transformer[type="xslt"|src=a.xsl] -> Serializer
The file generator is cacheable and generates a key which hashes the src
(or the filename) to build the key. The cache
validity object uses the last modification date of the xml file.
The xslt transformer is cacheable and generates a key which hashes
the filename to build the unique key. The cache validity object
uses the last modification date of the xml file.
Both keys are used to build a unique key for this pipeline,
the first time it is invoked its response is cached. The second time
this pipeline is called, the cached content is get from the cache.
If it is still valid, the cached content is directly feed into
the serializer.
Only part of the following pipeline is cached:
Generator[type=file|src=a.xml] -> Transformer[type="xslt"|src=a.xsl] -> Transformer[type=sql] -> Transformer[type="xslt"|src=b.xsl] -> Serializer
The file generator is cacheable and generates a key which hashes the src
(or the filename) to build the key. The cache
validity object uses the last modification date of the xml file.
The xslt transformer is cacheable and generates a key which hashes
the filename to build the unique key. The cache validity object
uses the last modification date of the xml file.
The sql transformer is not cacheable, so the caching algorithm stops
at this point although the last transformer is cacheable.
So the cached response is absolutely the same as in the first example
and therefore the unique key build from the two keys (from the
generator and the first transformer) is the same as in the first example.
The only difference is when the cached response is used. It is not
feed into the serializer but into the sql transformer.
The XMLSerializer/XMLDeserializer
The caching of the sax events is implemented by two Avalon components:
The XMLSerializer and the XMLDeserializer. The XMLSerializer gets
sax events and creates an object which is used by the XMLDeserializer
to recreate these sax events.
org.apache.cocoon.components.sax.XMLByteStreamCompiler
The XMLByteStreamCompiler compiles sax events into a byte stream.
org.apache.cocoon.components.sax.XMLByteStreamInterpreter
The XMLByteStreamInterpreter is the counterpart of the
XMLByteStreamCompiler . It interprets the byte
stream and creates sax events.
The Event Cache
The event cache contains the cached event pipelines (or the
CachedEventObject ). It is another Avalon component which
can be configured. It is possible to use the memory as a cache,
or the file system or a combination of both etc. This depends on
the used/configured event cache.
Caching of stream pipelines
The algorithm used for caching depends on the configured stream pipeline.
For more information about configuration see the chapter below.
The following subchapters describe the available caching algorithms.
The CachingStreamPipeline
The CachingStreamPipeline uses a very easy but effective approach
to cache the stream pipelines of a request: If the underlying
event stream and the serializer is cacheable the request is cached.
If a reader is used instead and it is cacheable, the response
is cached, too.
An event pipeline is cacheable if it implements the CacheableEventPipeline
interface. It generates a unique key for this event pipeline
and delivers the cache validity objects. The current CachingEventPipeline
for example is cacheable if all sitemap components are cacheable,
this includes the generator and all transformers. The generated key
is build upon the returned keys of the sitemap components and
the validity objects are the collected validity objects from the
sitemap components. If the response is cacheable the CachingStreamPipeline
informs the CacheableEventPipeline by calling the
method setStreamPipelineCaches . The event pipeline
can now decide if it also wants to cache the response thus nearly
duplicating the cached contents.
A serializer is cacheable if it implements the Cacheable interface.
In the case of a serializer the implementation is in most cases very
simple as a serializer often has no other input than the sax events. In
this case the key for this serializer can be a simple constant value
and the validity object is the NOPCacheValidity .
A reader is cacheable if it implements the Cacheable
interface.
When a response is cached all validity objects are stored together with
the cached response, which is actually a byte array, in the cache.
The CachedStreamObject encapsulates all this information.
When a new response is generated and the key is build, the caching
algorithm collects all uptodate cache validity objects. So if the
cached response is found in the cache these validity objects are compared.
If they are valid (or equal) the cached response is used and directly
returned. If they are not valid any more the cached response is removed
from the cache, the new response is generated and then stored together with
the new validity objects in the cache.
The Stream Cache
The stream cache contains the cached stream pipelines (or the
CachedStreamObject ). It is another
Avalon component which can be configured. It is possible to use
the memory as a cache, or the file system or a combination of both
etc. This depends on the used/configured event cache.
Configuration
The caching of Cocoon can be completely configured by different Avalon
components. This chapter describes which roles must/can be changed
to tune up your Cocoon system.
The Stream and the Event Pipeline
The stream and the event pipeline are represented by two Avalon
components which can be configured in the cocoon.xconf:
| | |
|
<event-pipeline
class="org.apache.cocoon.components.pipeline.CachingEventPipeline"/>
<stream-pipeline
class="org.apache.cocoon.components.pipeline.CachingStreamPipeline"/>
| |
| | |
If you want to completely turn off caching, use the following
definitions:
| | |
|
<event-pipeline
class="org.apache.cocoon.components.pipeline.NonCachingEventPipeline"/>
<stream-pipeline
class="org.apache.cocoon.components.pipeline.NonCachingStreamPipeline"/>
| |
| | |
The XMLSerializer/XMLDeserializer
The XMLSerializer and XMLDeserialzer are two Avalon components which
can be configured in the cocoon.xconf:
| | |
|
<xml-serializer
class="org.apache.cocoon.components.sax.XMLByteStreamCompiler"/>
<xml-deserializer
class="org.apache.cocoon.components.sax.XMLByteStreamInterpreter"/>
| |
| | |
You must assure that the correct (or matching) deserializer is
configured for the serializer.
Event Cache and Stream Cache
The EventCache and the StreamCache are two Avalon components which
can be configured in the cocoon.xconf:
| | |
|
<event-cache class="org.apache.cocoon.caching.EventMemoryCache"/>
<stream-cache class="org.apache.cocoon.caching.StreamMemoryCache"/>
| |
| | |
Java APIs
For more information on the java apis refer directly to the
javadocs of Cocoon.
The most important packages are:
-
org.apache.cocoon.caching : This package declares all interfaces for caching.
-
org.apache.cocoon.components.pipeline : The interfaces and implementations of the pipelines.
Utility classes
Hash Util
The org.apache.cocoon.util.HashUtil class provides some
methods for the BuzHash algorithm by Robert Uzgalis.
| | |
|
package org.apache.cocoon.util;
public class HashUtil {
public static long hash(String arg);
public static long hash(StringBuffer arg);
}
| |
| | |
|