How Cocoon 1.8 works
This document tries to follow the operations of Cocoon from a
"document point of view" while the javadoc documentation describes it
from a "procedural point of view".
Therefore, here we try to be complementary to the
javadoc and not to simply repeat what is stated there already. Furthermore,
since the ultimate documentation is the
source code itself, this
document tries not to go too deep but eventually to integrate with the comments
in the code. In fact, some people may find that reading the source code
directly will shed more light than just reading this (significantly incomplete)
Unless otherwise specified, for sake of brevity any class name
is assumed to have the
org.apache.cocoon prefix prepended to it.
This is the "main" class, either when Cocoon is being used as a servlet
or for command-line use. Clearly, it contains the methods
for the latter case as well as
main for the first case.
Hereafter are described the operations in the two common cases of command-line
execution (typically used for offline site creation), and servlet usage.
From the Command-line
Cocoon is invoked from the command-line, it requires as
arguments the location of the
cocoon.properties, the name
of the file containing the XML to be processed, and the name of the output
file. After reading the properties file, it creates a new
EngineWrapper initialized with the above mentioned properties
and then calls the
handle method, and hands it
Writer and an input
File. There is no good
reason for this asymmetry - the command-line operation mode of Cocoon was
coded quickly as a temporary hack to meet a popular need, in lieu of the
better, more integrated and well-designed command-line support planned for
This is a "hack" which provides a "fake" implementation of
the Servlet API methods that are needed by Cocoon, in the inner classes
HttpServletResponseImpl. When Cocoon gets integrated
with Stylebook, this class will probably need to be cleaned up.
Basically, this class instantiates an
As a Servlet
Engine class and passes
it the "fake" request and response objects mentioned above.
As for any servlet, upon startup the
init method is
invoked. In Cocoon, this tries to load the cocoon.properties file, and, if
that is successful, creates an
service method is provided by
accepts all incoming requests, whatever their type. Servlet programmers may be
accustomed to writing
doPost methods to
handle different types of requests, which is fine for simple servlets;
service method is the best way to implement a fully
generic servlet like Cocoon.
This class implements the engine that does all the document processing.
What better definition of the function of this class than the words of its
author (Stefano Mazzocchi)? From this otherwise lapidary definition, one
should realize the importance of this Class in the context of the Cocoon
operations and thus one should carefully read it through in order to understand
the "big picture" of how Cocoon works.
Either from command-line or from the servlet, upon startup of the cocoon
Engine is instantiated by the
private Engine constructor. For the sake of understanding Cocoon
operations, it is important to know that at this point in time (and only this
time in the whole lifespan of the Cocoon servlet) the objects performing the
initialization of the various components
are instantiated with the parameters contained by the Configuration object.
This is the reason why, if changes are applied to the cocoon.properties file,
these do not have any effect on Cocoon until the engine is stopped and
These objects either directly represent the components (such as
or are Factories to provide the correct components
for a particular request (such as
The long-winded setup code involved here reads class names from the
cocoon.properties file and dynamically loads and configures
the classes, thus allowing for easy "swapping in and out" of components
without recompiling the whole of Cocoon.
In general, all components
referenced here must be loadable at startup, otherwise Cocoon will refuse
to initialize - even if the missing component(s) are not actually used in
the web-application. Still, this is exactly the same situation as with
a more convential Java application which does not store class names in
handle method has been already mentioned previously
and is indeed the focal point for all the runtime operations of Cocoon.
It is invoked with two objects, one being the input
HttpServletRequest and one being the output
HttpServletResponse (just as in a servlet).
Until the whole page is done, it repeats the following process for up to
10 times (the pipeline will only need to be repeated if an OutOfMemoryError
occurs, in which case the cache will be cleared out somewhat and the
- Creates the
Page wrapper for cacheing purposes
- Gets the initial document
Producer from the
ProducerFactory. The HTTP parameter "producer=myproducer"
can be used to select the producer; if this parameter is not present,
the default producer is used.
- Calls the producer to generate an
- Setup the hash table
environment to pass various parameters
to the processor pipeline
- Process the document through the document
(obtained from the
for each processor invoked in the
- Get the
Formatter requested by the
- Format the page
- Fill the
Page bean with content
- Set the content type and the encoding
- Print the page to the response's PrintWriter object
- Append timing information as an XML comment, if the content type allows
- Flush the PrinterWriter to the client
- Cache the page (if cacheing is enabled)
Now, I suggest you to take a deep breath and read the above steps again, since
the simplicity of the algorithm exploited is so beautiful that it makes sense
to appreciate it in depth and breath.
At this point the key elements are therefore the processors and the formatters,
which directly operate upon the content of the Document. We are going to
investigate them in detail. It should be already clear that indeed one can have
more than one
Document and that these
are going to be applied sequentially one after the other. Namely, this is how
is implemented the "chaining" of various
in five lines of code (including debugging information).
Again, simplicity and good coding style are assets of this implementation.
Let us have a look then at what
Formatters are, since these could be leveraged further and indeed
these are going to be likely extended with new components for specific needs.
For each source there must be an appropriate Producer implemented. Currently
(version 1.8), only ProducerFromFile is implemented. This is because XSP provides
the best solution (both in terms of ease-of-use and forward-compatibility with
Cocoon 2) for nearly all dynamic content solutions, so there is usually
no need to write a Producer explicitly.
For each processing instruction type there must be an appropriate Processor
implemented. Currently (version 1.8), the following ones are implemented:
- Light weight Directory Access Protocol (LDAP)
- SQL (deprecated - SQL or EQSL taglibs are preferred)
- eXtendible Server Pages (supercedes Dynamic Content Processor)
- Dynamic Content Processor (deprecated, use XSP instead)
- XInclude (attempts to implement a W3C draft standard, but may not always
be up to date with the standard - as it is still evolving)
- XSLT (implements the W3C Recommendation, XSLT)
For each format in which the output should be delivered
(e.g. PDF, TEXT, HTML, XML, XHTML ), there must be an appropriate Formatter
implemented. Currently (version 1.8), the following ones are distributed:
- XHTML (while the HTML formatter writes some tags without closing tags for
compatability with older user agents, the XHTML formatter is fully
XML-compliant - indeed, it is just the XML formatter with a specific doctype.)
- Text (i.e. plain text)
- FO2PDF (transforms XSL:FO to PDF which can be read by Acrobat Viewer/Reader)
Clearly, one might imagine many more formatters such as
- FO2RTF Microsoft Rich Text Format
- FO2MIF FrameMaker Interchange Format
In Cocoon 1.8 all of the formatters provided are in fact implemented as simple
"wrapper" classes (as can be easily seen by examining the source code in the
formatters directory) which merely set the parameters to the Apache
Serializers, or in the case of FO2PDF, Apache FOP, and then delegate the actual
formatting to those classes. In a way, no "real work" actually goes on
in the Formatter classes themselves. As you can see, Cocoon is a framework which
tries not to reinvent the wheel too often!
If you're wondering why FO2PDF isn't a Processor instead of a Formatter, the
answer is simple - it is conceptually more of a Processor (it transforms the entire
document), but for one vital difference - it does not output XML. Yes, there is
the workaround that XSP uses internally, which is to output one XML element with
all the content inside that as a text node - but this method would be rather clunky
for FO2PDF and would provide no real benefit.
Note that the CPU-intensive processing required for FO2PDF can be obviated by
the use of newer XML-compliant graphics and document markup languages on the client
side, such as SVG (Scalable Vector Graphics), or XSL:FO itself, which can just be
written out as XML. This is definitely the future for dynamic web
publishing, since the "rendering" of dozens of concurrent users' documents into PDF
all on the server does not make any sense from a performance point of view - it is
advantageous today of course because current popular browsers do not support XSL:FO
or SVG natively, but in the future this will change.
In fact, XML markup languages like VoiceXML are supported by Cocoon by returning XML
and indeed in that case the parameter to cocoon-format is
text/xml! In the
case of VRML, the cocoon format is
model/vrml which in the
configuration file is mapped to