Apache » Cocoon »

  Cocoon Core

Cocoon Core 2.2

Writing Cache Efficient Components

Writing Cache Efficient Components

The bulk of this document is based heavily on documentation that Sylvain Wallez wrote on Writing for Cache Efficiency. We're just reorganizing the information in a way that's easier to digest. As you recall, to enable caching for a sitemap component you have to implement the CacheableProcessingComponent contracts. Unfortunately, that does not give you an idea of how to minimize the impact of verifying the cache validity of a component. The general strategy that works best for cacheable components is lazy evaluation, or wait until the last possible moment before you do your calculations--because you may not need them.

Understanding the Order of Calls

In order to know when to actually do the complex set up for any given resource, it helps to know the exact order of calls as it relates to that component.  From the perspective of this document you can assume the pipeline has already been set up, and we are now getting the components ready.  Cocoon is deterministic in the sense that the call order is the same every time.  Making your caching more efficient requires that you take advantage of this knowledge.  First the sequence of events:

  1. Cocoon calls setup()--which includes serializers that implement SitemapModelComponent.
  2. Cocoon calls getMimeType() on the serializer (or reader).
  3. Cocoon calls getKey() on all CacheableProcessingComponents.
  4. Cocoon checks the cache for any Validity objects matching that key.
  5. If there is an entry matching, Cocoon validates the Validity object; otherwise we jump to step 7.
  6. If the Validity object is still valid, Cocoon uses the cached entry in place of calling your component; or if the Validity object is invalid, Cocoon will then call your component; otherwise, Cocoon falls through to the next step.
  7. If we have gotten to this point, Cocoon will call the getValidity() method on your CacheableProcessingComponent.  Cocoon will then compare the previous validity object against the new one, or if this is the first call to getValidity() then we validate the returned validity object.  If the cache entry is valid Cocoon uses the cached results, otherwise we call the component.
  8. If the validity still can't be determined the next step is dependant on the cache component (i.e. default to better performance with the risk of stale data or default to safety and fresh data).
  9. Assuming we have gotten this far and the key is either not in the cache or the entry is stale, Cocoon calls setXMLConsumer() on all the XMLProducer components (typically generators and transformers), and setOutputStream() on the Serializer or Reader.
  10. Cocoon calls the generate() method on the Generator or Reader.
That's a lot of steps, providing as many opportunities to use a cache as possible.  It also provides the opportunity to delay when we incur certain checks until the last possible moment.
Note: The Cocoon team has been working on an adaptive cache which performs cost calculations.  It measures the cost of generating/transforming a result, the cost of determining its cache validity, and its own influence on the system.  The bottom line is that just because something may be a valid entry, it may still be cheaper to generate the resource in terms of that cost function than to use the cached value.  The only guarantees that you have for when something is going to be called are the methods from the sitemap interfaces and the big component interfaces (i.e. the Generator, Transformer, Serializer, and Reader).  Don't perform any critical setup inside a CacheableProcessingComponent method.

Case Study: Improving the TraxTransformer

Back in 2003, the TraxTransformer performed all caching and heavy payload setup within the setup() method.  What this meant was that the TransformerHandler object was being created for the XSLT file at the same time the FileValidity object for that file was set up.  The TransformerHandler object is heavy, and there is alot of work in setting that thing up.  The affect is that the TraxTransformer incurred the cost of setting up the TransformerHandler whether it was used or not.  When the pipeline pulled from the cache, the TransformerHandler was created and discarded.  You have the overhead of the garbage collection along with unused objects.Sylvain saw the problem, and delayed creating the TransformerHandler until Cocoon called the setXMLConsumer() method.  This ensured that every opportunity was given to check cache validity and we only incurred the cost of creating the TransformerHandler when Cocoon was really going to use it.  Another safe place to put the completed setup code is on the startDocument() SAX method.  At this point it is clear we are currently using the TransformerHandler, so it will also work.After everything was said and done, the TraxTransformer performed between 5% to 30% better depending on the complexity of the TransformerHandler.  The key was delaying the heavy lifting until it was actually needed.

AggregateValidity and DelayedAggregateValidity

Some components like the DirectoryGenerator and the TraxTransformer rely on the validity of other factors than just a template or a set of files.  These components often can't determine the validity at setup time.  The solution is to use the AggregateValidity and more specifically the DelayedAggregateValidity.  The aggregated validity object provides an interface for you to add additional validity components inside and returns the result of the set (typically if one validity object is undertermined or invalid the whole set is).  You can add to the aggregated validity object as the pipeline is executed.  Every time the TraxTransformer includes another XML document using the document() function in XSLT, it's FileValidity is added to the aggregated validity object. The DirectoryGenerator relies on an internal pipeline to be run, and because we don't know the validity until after the pipeline is run, it is impossible to set up the validity objects ahead of time.  The solution in this case is to use the DelayedAggregateValidity object.  Placeholders are given using the DelayedValidity interface, and when the solid validity object is ready it can be used.  Essentially the full validity object is assembled as the pipeline is run.  The next time the aggregated validity object is inspected it is set up already.While these are possible solutions to a complex problem, they do incur their own overhead.  Done well, the overhead is still less than creating the content fresh every time--but care should be taken that we don't have a huge validity object tree by having aggregate validity objects including aggregate validity objects that include aggregate validity objects.  In short, you have to keep it simple.  The general rule of thumb is that if you can't write a simple unit test for it, you probably need to start looking to simplify.  Cocoon has many tools for caching and cache control, understanding how things work will help you write more efficient components.