Caching
http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Main
User Documentation

Concepts
Overview
Sitemap
Views
Caching
Actions
Matchers and Selectors
Entity Catalogs
MRUMemoryStore
Persistence
StoreJanitor
XMLSearching
XML Validation
Databases
Modules

Goal

This document explains the basic caching algorithm of Apache Cocoon.

Overview

The caching algorithm of Cocoon has a very flexible and powerful design. The algorithms and components used are not hardcoded into the core of Cocoon. They can be configured using Avalon components.

This document describes the components available for caching, how they can be configured and how to implement your own cacheable components.

Caching of event pipelines

The algorithm used for caching depends on the event pipeline configured. For more information about configuration see the chapter below.

The following subchapters describe the available caching algorithms.

The CachingEventPipeline

The CachingEventPipelineuses a very easy but effective approach to cache the event pipelines of a request: The pipeline process is cached up to the most possible point.

Each sitemap component (generator or transformer) which might be cacheable must implement the Cacheable interface. When the event pipeline is processed each sitemap component starting with the generator is asked if it implements this interface. This test stops either when the first component does not implement the Cacheable interface or when the first cacheable component is currently not cacheable for any reasons (more about this in a moment).

The Cacheable interface declares a method generateKey() which must produce a unique key for this sitemap component inside the component space. For example the FileGenerator generates a hash of the source argument (the xml document read). All parameters/values which are used for the processing of the request by the generator must be used for this key. If, e.g. the request parameters are used by the component, it must build a key with respect to the current request parameters.

If for any reason the sitemap component detects that the current request is not cacheable it can simply return 0 as the key. This has the same effect as not declaring the Cacheable interface.

Now after the key is build for this particular request, it is looked up in the cache if it exists. If not, the new request is generated and cached for further requests.

If a cached response is found for the key, the caching algorithm checks if this response is still valid. For this check each cacheable component returns a validity object when the method generateValidity is invoked. (If a cacheable component returns null it is temporarily not cacheable, like returning 0 for the key.)

A CacheValidity object contains all information the component needs to verify if the cached content is still valid. For example the file generator stores the last modification date of the xml document parsed in the validity object.

When a response is cached all validity objects are stored together with the cached response in the cache. Actually the CachedEventObject is stored which encapsulates all this information.

When a new response is generated and the key is build, the caching algorithm also collects all uptodate cache validity objects. So if the cached response is found in the cache these validity objects are compared. If they are valid (or equal) the cached response is used and feed into the pipeline. If they are not valid any more the cached response is removed from the cache, the new response is generated and then stored together with the new validity objects in the cache.

Examples

If you have the following pipeline:

Generator[type=file|src=a.xml] -> Transformer[type="xslt"|src=a.xsl] -> Serializer

The file generator is cacheable and generates a key which hashes the src (or the filename) to build the key. The cache validity object uses the last modification date of the xml file.

The xslt transformer is cacheable and generates a key which hashes the filename to build the unique key. The cache validity object uses the last modification date of the xml file.

Both keys are used to build a unique key for this pipeline, the first time it is invoked its response is cached. The second time this pipeline is called, the cached content is get from the cache. If it is still valid, the cached content is directly feed into the serializer.

Only part of the following pipeline is cached:

Generator[type=file|src=a.xml] -> Transformer[type="xslt"|src=a.xsl] -> Transformer[type=sql] -> Transformer[type="xslt"|src=b.xsl] -> Serializer

The file generator is cacheable and generates a key which hashes the src (or the filename) to build the key. The cache validity object uses the last modification date of the xml file.

The xslt transformer is cacheable and generates a key which hashes the filename to build the unique key. The cache validity object uses the last modification date of the xml file.

The sql transformer is not cacheable, so the caching algorithm stops at this point although the last transformer is cacheable.

So the cached response is absolutely the same as in the first example and therefore the unique key build from the two keys (from the generator and the first transformer) is the same as in the first example. The only difference is when the cached response is used. It is not feed into the serializer but into the sql transformer.

The XMLSerializer/XMLDeserializer

The caching of the sax events is implemented by two Avalon components: The XMLSerializer and the XMLDeserializer. The XMLSerializer gets sax events and creates an object which is used by the XMLDeserializer to recreate these sax events.

org.apache.cocoon.components.sax.XMLByteStreamCompiler

The XMLByteStreamCompilercompiles sax events into a byte stream.

org.apache.cocoon.components.sax.XMLByteStreamInterpreter

The XMLByteStreamInterpreter is the counterpart of the XMLByteStreamCompiler. It interprets the byte stream and creates sax events.

The Event Cache

The event cache contains the cached event pipelines (or the CachedEventObject). It is another Avalon component which can be configured. It is possible to use the memory as a cache, or the file system or a combination of both etc. This depends on the used/configured event cache.

Caching of stream pipelines

The algorithm used for caching depends on the configured stream pipeline. For more information about configuration see the chapter below.

The following subchapters describe the available caching algorithms.

The CachingStreamPipeline

The CachingStreamPipeline uses a very easy but effective approach to cache the stream pipelines of a request: If the underlying event stream and the serializer is cacheable the request is cached. If a reader is used instead and it is cacheable, the response is cached, too.

An event pipeline is cacheable if it implements the CacheableEventPipeline interface. It generates a unique key for this event pipeline and delivers the cache validity objects. The current CachingEventPipeline for example is cacheable if all sitemap components are cacheable, this includes the generator and all transformers. The generated key is build upon the returned keys of the sitemap components and the validity objects are the collected validity objects from the sitemap components. If the response is cacheable the CachingStreamPipeline informs the CacheableEventPipeline by calling the method setStreamPipelineCaches. The event pipeline can now decide if it also wants to cache the response thus nearly duplicating the cached contents.

A serializer is cacheable if it implements the Cacheable interface. In the case of a serializer the implementation is in most cases very simple as a serializer often has no other input than the sax events. In this case the key for this serializer can be a simple constant value and the validity object is the NOPCacheValidity.

A reader is cacheable if it implements the Cacheable interface.

When a response is cached all validity objects are stored together with the cached response, which is actually a byte array, in the cache. The CachedStreamObject encapsulates all this information.

When a new response is generated and the key is build, the caching algorithm collects all uptodate cache validity objects. So if the cached response is found in the cache these validity objects are compared. If they are valid (or equal) the cached response is used and directly returned. If they are not valid any more the cached response is removed from the cache, the new response is generated and then stored together with the new validity objects in the cache.

The Stream Cache

The stream cache contains the cached stream pipelines (or the CachedStreamObject). It is another Avalon component which can be configured. It is possible to use the memory as a cache, or the file system or a combination of both etc. This depends on the used/configured event cache.

Configuration

The caching of Cocoon can be completely configured by different Avalon components. This chapter describes which roles must/can be changed to tune up your Cocoon system.

The Stream and the Event Pipeline

The stream and the event pipeline are represented by two Avalon components which can be configured in the cocoon.xconf:

     
<event-pipeline
    class="org.apache.cocoon.components.pipeline.CachingEventPipeline"/>

<stream-pipeline
    class="org.apache.cocoon.components.pipeline.CachingStreamPipeline"/>
     
    

If you want to completely turn off caching, use the following definitions:

     
<event-pipeline
    class="org.apache.cocoon.components.pipeline.NonCachingEventPipeline"/>

<stream-pipeline
    class="org.apache.cocoon.components.pipeline.NonCachingStreamPipeline"/>
     
    
The XMLSerializer/XMLDeserializer

The XMLSerializer and XMLDeserialzer are two Avalon components which can be configured in the cocoon.xconf:

     
<xml-serializer
    class="org.apache.cocoon.components.sax.XMLByteStreamCompiler"/>

<xml-deserializer
    class="org.apache.cocoon.components.sax.XMLByteStreamInterpreter"/>
     
    

You must assure that the correct (or matching) deserializer is configured for the serializer.

Event Cache and Stream Cache

The EventCache and the StreamCache are two Avalon components which can be configured in the cocoon.xconf:

     
<event-cache class="org.apache.cocoon.caching.EventMemoryCache"/>

<stream-cache class="org.apache.cocoon.caching.StreamMemoryCache"/>
     
    
Java APIs

For more information on the java apis refer directly to the javadocs of Cocoon.

The most important packages are:

  1. org.apache.cocoon.caching: This package declares all interfaces for caching.
  2. org.apache.cocoon.components.pipeline: The interfaces and implementations of the pipelines.
Utility classes
Hash Util

The org.apache.cocoon.util.HashUtil class provides some methods for the BuzHash algorithm by Robert Uzgalis.

     
       package org.apache.cocoon.util;

       public class HashUtil {

         public static long hash(String arg);
         public static long hash(StringBuffer arg);
       }
     
    
Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.