How the Cocoon Engine Works

This project has retired. For details please refer to its Attic page.

How it works

How Cocoon 1.8 works

This document tries to follow the operations of Cocoon from a "document point of view" while the javadoc documentation describes it from a "procedural point of view". Therefore, here we try to be complementary to the javadoc and not to simply repeat what is stated there already. Furthermore, since the ultimate documentation is the source code itself, this document tries not to go too deep but eventually to integrate with the comments in the code. In fact, some people may find that reading the source code directly will shed more light than just reading this (significantly incomplete) overview.

Unless otherwise specified, for sake of brevity any class name is assumed to have the org.apache.cocoon prefix prepended to it.

Cocoon

This is the "main" class, either when Cocoon is being used as a servlet or for command-line use. Clearly, it contains the methods init for the latter case as well as main for the first case.

Hereafter are described the operations in the two common cases of command-line execution (typically used for offline site creation), and servlet usage.

From the Command-line

When Cocoon is invoked from the command-line, it requires as arguments the location of the cocoon.properties, the name of the file containing the XML to be processed, and the name of the output file. After reading the properties file, it creates a new EngineWrapper initialized with the above mentioned properties and then calls the handle method, and hands it an output Writer and an input File. There is no good reason for this asymmetry - the command-line operation mode of Cocoon was coded quickly as a temporary hack to meet a popular need, in lieu of the better, more integrated and well-designed command-line support planned for Cocoon 2.

EngineWrapper

This is a "hack" which provides a "fake" implementation of the Servlet API methods that are needed by Cocoon, in the inner classes HttpServletRequestImpl and HttpServletResponseImpl. When Cocoon gets integrated with Stylebook, this class will probably need to be cleaned up.

Basically, this class instantiates an Engine class and passes it the "fake" request and response objects mentioned above.

As a Servlet Startup Phase

As for any servlet, upon startup the init method is invoked. In Cocoon, this tries to load the cocoon.properties file, and, if that is successful, creates an Engine instance.

Production Phase

A service method is provided by Cocoon, which accepts all incoming requests, whatever their type. Servlet programmers may be accustomed to writing doGet or doPost methods to handle different types of requests, which is fine for simple servlets; however, a service method is the best way to implement a fully generic servlet like Cocoon.

Engine

This class implements the engine that does all the document processing.

What better definition of the function of this class than the words of its author (Stefano Mazzocchi)? From this otherwise lapidary definition, one should realize the importance of this Class in the context of the Cocoon operations and thus one should carefully read it through in order to understand the "big picture" of how Cocoon works.

Startup Phase

Either from command-line or from the servlet, upon startup of the cocoon servlet the Engine is instantiated by the private Engine constructor. For the sake of understanding Cocoon operations, it is important to know that at this point in time (and only this time in the whole lifespan of the Cocoon servlet) the objects performing the initialization of the various components are instantiated with the parameters contained by the Configuration object. This is the reason why, if changes are applied to the cocoon.properties file, these do not have any effect on Cocoon until the engine is stopped and then restarted.

These objects either directly represent the components (such as logger.ServletLogger) or are Factories to provide the correct components for a particular request (such as processor.ProcessorFactory). The long-winded setup code involved here reads class names from the cocoon.properties file and dynamically loads and configures the classes, thus allowing for easy "swapping in and out" of components without recompiling the whole of Cocoon.

In general, all components referenced here must be loadable at startup, otherwise Cocoon will refuse to initialize - even if the missing component(s) are not actually used in the web-application. Still, this is exactly the same situation as with a more convential Java application which does not store class names in configuration files.

production phase

The handle method has been already mentioned previously and is indeed the focal point for all the runtime operations of Cocoon. It is invoked with two objects, one being the input HttpServletRequest and one being the output HttpServletResponse (just as in a servlet).

Until the whole page is done, it repeats the following process for up to 10 times (the pipeline will only need to be repeated if an OutOfMemoryError occurs, in which case the cache will be cleared out somewhat and the pipeline restarted):

Creates the Page wrapper for cacheing purposes

Gets the initial document Producer from the ProducerFactory. The HTTP parameter "producer=myproducer" can be used to select the producer; if this parameter is not present, the default producer is used.

Calls the producer to generate an org.w3c.dom.Document

Setup the hash table environment to pass various parameters to the processor pipeline

Process the document through the document Processors, (obtained from the ProcessorFactory) for each processor invoked in the Document

Get the Formatter requested by the Document from the FormatterFactory

Format the page

Fill the Page bean with content

Set the content type and the encoding

Finally,

Print the page to the response's PrintWriter object

Append timing information as an XML comment, if the content type allows

Flush the PrinterWriter to the client

Cache the page (if cacheing is enabled)

Now, I suggest you to take a deep breath and read the above steps again, since the simplicity of the algorithm exploited is so beautiful that it makes sense to appreciate it in depth and breath.

At this point the key elements are therefore the processors and the formatters, which directly operate upon the content of the Document. We are going to investigate them in detail. It should be already clear that indeed one can have more than one Processor per Document and that these are going to be applied sequentially one after the other. Namely, this is how is implemented the "chaining" of various Processors: in five lines of code (including debugging information). Again, simplicity and good coding style are assets of this implementation. Let us have a look then at what Processors and Formatters are, since these could be leveraged further and indeed these are going to be likely extended with new components for specific needs.

ProducerFactory

For each source there must be an appropriate Producer implemented. Currently (version 1.8), only ProducerFromFile is implemented. This is because XSP provides the best solution (both in terms of ease-of-use and forward-compatibility with Cocoon 2) for nearly all dynamic content solutions, so there is usually no need to write a Producer explicitly.

ProcessorFactory

For each processing instruction type there must be an appropriate Processor implemented. Currently (version 1.8), the following ones are implemented:

Light weight Directory Access Protocol (LDAP)

SQL (deprecated - SQL or EQSL taglibs are preferred)

eXtendible Server Pages (supercedes Dynamic Content Processor)

Dynamic Content Processor (deprecated, use XSP instead)

XInclude (attempts to implement a W3C draft standard, but may not always be up to date with the standard - as it is still evolving)

XSLT (implements the W3C Recommendation, XSLT)

FormatterFactory

For each format in which the output should be delivered (e.g. PDF, TEXT, HTML, XML, XHTML ), there must be an appropriate Formatter implemented. Currently (version 1.8), the following ones are distributed:

HTML

XHTML (while the HTML formatter writes some tags without closing tags for compatability with older user agents, the XHTML formatter is fully XML-compliant - indeed, it is just the XML formatter with a specific doctype.)

Text (i.e. plain text)

XML

FO2PDF (transforms XSL:FO to PDF which can be read by Acrobat Viewer/Reader)

Clearly, one might imagine many more formatters such as

FO2RTF Microsoft Rich Text Format

FO2MIF FrameMaker Interchange Format

BRAILLE

In Cocoon 1.8 all of the formatters provided are in fact implemented as simple "wrapper" classes (as can be easily seen by examining the source code in the formatters directory) which merely set the parameters to the Apache Serializers, or in the case of FO2PDF, Apache FOP, and then delegate the actual formatting to those classes. In a way, no "real work" actually goes on in the Formatter classes themselves. As you can see, Cocoon is a framework which tries not to reinvent the wheel too often!

If you're wondering why FO2PDF isn't a Processor instead of a Formatter, the answer is simple - it is conceptually more of a Processor (it transforms the entire document), but for one vital difference - it does not output XML. Yes, there is the workaround that XSP uses internally, which is to output one XML element with all the content inside that as a text node - but this method would be rather clunky for FO2PDF and would provide no real benefit.

Note that the CPU-intensive processing required for FO2PDF can be obviated by the use of newer XML-compliant graphics and document markup languages on the client side, such as SVG (Scalable Vector Graphics), or XSL:FO itself, which can just be written out as XML. This is definitely the future for dynamic web publishing, since the "rendering" of dozens of concurrent users' documents into PDF all on the server does not make any sense from a performance point of view - it is advantageous today of course because current popular browsers do not support XSL:FO or SVG natively, but in the future this will change.

In fact, XML markup languages like VoiceXML are supported by Cocoon by returning XML and indeed in that case the parameter to cocoon-format is text/xml! In the case of VRML, the cocoon format is model/vrml which in the cocoon.properties configuration file is mapped to TextFormatter.