Write a Custom Generator
Introduction
This Tutorial describes the steps necessary to write a basic Cocoon generator. Starting with a quick "Hello World" example and progressing to slightly more involved examples should give a good start to those whose applications call for extending Cocoon with a custom generator.
The intention is to provide:
- the basics of creating SAX events in a C2 generator
- a little understanding of the Avalon container contract as it relates to C2 generators
- a little understanding of the factors that would influence the decision about which xxxGenerator to extend
Purpose
The flexibility to extend the basic "Out of the box" functionality of Cocoon will be an important feature for Cocoon's viability as a broadly used application framework. Though the documentation on "Extending Cocoon" (at least at this writing) seems to have a hard time imagining applications for custom generators outside of the bizarre, I imagine several scenarios which could call for it:
- A datasource as yet undeveloped in Cocoon (e.g. event logs)
- Database driven applications for which XSP is either too awkward or holds too many performance questions. The need for high scalability will drive some (such as myself) to seek optimization in custom generators that just do not seem reasonable to expect out of the auto-generated code that XSPs produce. The current Performance Tips documentation seems to lead in this direction.
- Customized control over the caching behaviour if not provided for by other means.
Important
There are other options that should be considered before settling on a new generator. One notable consideration is the option of writing a Source that would fit your needs. See this discussion from the mailing list for an introduction to the idea. Of course, XSP should be considered - I have not seen any performance comparisons that quantify the benefit that can be had from a custom generator. Finally, be sure you understand the purpose and capabilities of all current standard Generators, as well as those in the scratchpad (for instance, there is a TextParserGenerator in the scratchpad at the moment which may be configurable enough to process the event log need mentioned above). Cocoon is a rapidly developing technology that may have anticipated your need. Because the documentation lags behind development, you may find more by examining the source directory and searching the mail archives for applicable projects.
Intended Audience
This Tutorial is aimed at users who have developed an understanding of the basics of Cocoon and have a need to begin extending it for their own purposes, or desire a deeper understanding of what goes on under the hood.
Prerequisites
Generator developers should have:
- Read Cocoon Concepts , as well as Extending Cocoon , and the broad overview of Avalon , the framework upon which Cocoon is built.
- An installed version of Cocoon if you want to follow the examples yourself (obviously).
- A good understanding of Java.
- Java SDK (1.2 or later) "installed".
Diving In
Let us start with a simple "Hello World" example:
Simple Example
Our goal will be to build the following document (or, more to the point, the SAX events that would correspond to this document).
<example>Hello World!</example>
An example of code that will send the correct SAX events down the pipeline:
import org.apache.cocoon.generation.AbstractGenerator; import org.xml.sax.helpers.AttributesImpl; import org.xml.sax.SAXException; public class HelloWorldGenerator extends AbstractGenerator { AttributesImpl emptyAttr = new AttributesImpl(); /** * Override the generate() method from AbstractGenerator. * It simply generates SAX events using SAX methods. * I haven't done the comparison myself, but this * has to be faster than parsing them from a string. */ public void generate() throws SAXException { // the org.xml.sax.ContentHandler is inherited // through org.apache.cocoon.xml.AbstractXMLProducer contentHandler.startDocument(); contentHandler.startElement("", "example", "example", emptyAttr); contentHandler.characters("Hello World!".toCharArray(),0, "Hello World!".length()); contentHandler.endElement("","example", "example"); contentHandler.endDocument(); } }
So, the basic points are that we extend AbstractGenerator, override its generate() method, call the relevant SAX methods on the contentHandler (inherited from AbstractGenerator) to start, fill and end the document. For information on the SAX api, see www.saxproject.org
What to Extend?
How did we choose to extend AbstractGenerator? Generators are defined by the org.apache.cocoon.generation.Generator interface. The only direct implementation of this of interest to us is AbstractGenerator, which gives a basic level of functionality. Another option would have been ServiceableGenerator, which would give us the added functionality of implenting the Avalon interface Serviceable, which would signal the container that handles all the components including our generator to give us a handle back to the ServiceManager during the startup of the container. If we needed to lookup a pooled database connection, or some other standard or custom Cocoon component, this is what we would do. Most of the out of the box Generators extend ServiceableGenerator. Other abstract Generators you may look to for inspiration (or choose to extend) include the ServletGenerator, and AbstractServerPage. While these both introduce functionality specific to their eventual purpose - the JSP and XSP generators, they do make a convenient starting place for many other Generators.
Running The Sample
In order to run this sample, you will need to compile the code, deploy it into the cocoon webapp, and modify the sitemap to declare our generator and allow access to it via a pipeline.
Compile
Save this source as HelloWorldGenerator.java and compile it using
javac -classpath %PATH_TO_JARS%\cocoon.jar;%PATH_TO_JARS%\xml-apis.jar HelloWorldGenerator.java
Unfortunately for me, the exact name of your cocoon and xml-apis jars may vary with exactly which distribution, or CVS version you are using, since the community has taken to appending dates or versions at the end of the jar name to avoid confusion. Be sure to find the correct name on your system and substitute it in the classpath. Also, you have several options on where to find jars. If you have a source version that you built yourself, you may want to point to lib\core\ for them. If you have only the binary version, you can find them in WEB-INF\lib\
Deploy
Simply copy the class file into the %TOMCAT_HOME%\webapps\cocoon\WEB-INF\classes directory
Sitemap Modifications
You need to do two things: in the map:generators section, add an element for your class:
<map:generator name="helloWorld" src="HelloWorldGenerator"/>
Then add a pipeline to sitemap.xmap which uses it:
... <map:match pattern="heyThere.xml"> <map:generate type="helloWorld"/> <map:serialize type="xml"/> </map:match> ...
And finally, our creation should be available at http://localhost:8080/cocoon/heyThere.xml
Depending on your exact setup, you may need to restart Tomcat (or whatever your servlet container is) to get there.
A Less Trivial Example
Moving on to a less trivial example, we will take some information out of the Request, and construct a slightly more involved document. This time, our goal will be the following document:
<doc> <uri>...</uri> <params> <param value="...">...</param> ... </params> <date>..</date> </doc>
The values of course will be filled in from the request, and will depend on choices we make later.
import org.apache.cocoon.generation.AbstractGenerator; import org.xml.sax.helpers.AttributesImpl; import org.xml.sax.SAXException; // for the setup() method import org.apache.cocoon.environment.SourceResolver; import java.util.Map; import org.apache.avalon.framework.parameters.Parameters; import org.apache.cocoon.ProcessingException; import java.io.IOException; // used to deal with the request parameters. import org.apache.cocoon.environment.ObjectModelHelper; import org.apache.cocoon.environment.Request; import java.util.Enumeration; import java.util.Date; public class RequestExampleGenerator extends AbstractGenerator { // Will be initialized in the setup() method and used in generate() Request request = null; Enumeration paramNames = null; String uri = null; // We will use attributes this time. AttributesImpl myAttr = new AttributesImpl(); AttributesImpl emptyAttr = new AttributesImpl(); public void setup(SourceResolver resolver, Map objectModel, String src, Parameters par) throws ProcessingException, SAXException, IOException { super.setup(resolver, objectModel, src, par); request = ObjectModelHelper.getRequest(objectModel); paramNames = request.getParameterNames(); uri = request.getRequestURI(); } /** * Implement the generate() method from AbstractGenerator. */ public void generate() throws SAXException { contentHandler.startDocument(); contentHandler.startElement("", "doc", "doc", emptyAttr); // <uri> and all following elements will be nested inside the doc element contentHandler.startElement("", "uri", "uri", emptyAttr); contentHandler.characters(uri.toCharArray(),0,uri.length()); contentHandler.endElement("", "uri", "uri"); contentHandler.startElement("", "params", "params", emptyAttr); while (paramNames.hasMoreElements()) { // Get the name of this request parameter. String param = (String)paramNames.nextElement(); String paramValue = request.getParameter(param); // Since we've chosen to reuse one AttributesImpl instance, // we need to call its clear() method before each use. We // use the request.getParameter() method to look up the value // associated with the current request parameter. myAttr.clear(); myAttr.addAttribute("","value","value","",paramValue); // Each <param> will be nested inside the containing <params> element. contentHandler.startElement("", "param", "param", myAttr); contentHandler.characters(param.toCharArray(),0,param.length()); contentHandler.endElement("","param", "param"); } contentHandler.endElement("","params", "params"); contentHandler.startElement("", "date", "date", emptyAttr); String dateString = (new Date()).toString(); contentHandler.characters(dateString.toCharArray(),0,dateString.length()); contentHandler.endElement("", "date", "date"); contentHandler.endElement("","doc", "doc"); contentHandler.endDocument(); } public void recycle() { super.recycle(); this.request = null; this.paramNames = null; this.parNames = null; this.uri = null; } }
Compile and Test
Save this code as RequestExampleGenerator.java and compile as before. You will need to add avalon-framework.jar, excalibur-pool-api.jar, excalibur-datasource.jar, and excalibur-sourceresolve.jar to your classpath this time. (Not all of these may be necessary at this point, but will be later so you might as well add them now.)
For your sitemap, you will need to add a definition for this generator like <map:generator name="requestExample" src="RequestExampleGenerator"/> and you will need a sitemap pipeline like:
<map:match pattern="howYouDoin.xml"> <map:generate type="requestExample"/> <map:serialize type="xml"/> </map:match>
At this point, you should be able to access the example at http://localhost:8080/cocoon/howYouDoin.xml?anyParam=OK&more=better
New Concepts
Lifecycle
First, notice that we now override the setup(...) and recycle() methods defined in AbstractGenerator. The ComponentManager that handles the lifecycle of all components in Cocoon, calls setup(..) before each new call to generate() to give the Generator information about the current request and its environment, and calls recycle() when it is done to enable it to clean up resources as appropriate. Our example uses only the objectModel which abstracts the Request, Response, and Context. We get a reference to the Request wrapper, and obtain an Enumeration of all the GET/POST parameters available.
The src and SourceResolver are provided to enable us to look up and use whatever source is specified in the pipeline setup. Had we specified <map:generate type="helloWorld" src="someSourceString"/> we would have used the SourceResolver to work with "someSourceString", whether it be a file, or url, etc.
We are also given a Parameters reference which we would use to obtain any parameter names and values which are children elements of our map:generate element in the pipeline.
In RequestExampleGenerator.java, ... String param1 = null; String param2 = null; ... public void setup(SourceResolver resolver, Map objectModel, String src, Parameters par) throws ProcessingException, SAXException, IOException { ... param1 = par.getParameter("param1"); param2 = par.getParameter("param2"); } and in sitemap.xmap, ... <map:match pattern="abstractedParameters.xml"/> <map:act type="request"> <map:parameter name="parameters" value="true"/> <map:generate type="requestExample"> <parameter name="param1" value="{visibleName1}"/> <parameter name="param2" value="{visibleName2}"/> </map:generate> </map:act> </map:match> ...
As you can see, we have also hidden the internal name from the outside world who will use ?visibleName1=foo&visibleName2=bar
Nested Elements
In this example, nested elements are created simply by nesting complete startElement()/endElement pairs within each other. If we had a logic failure in our code and sent non-wellformed xml events down the pipeline, nothing in our process would complain (try it!). Of course, any transformers later in the pipeline would behave in an unpredictable manner.
Attributes
Finally, we've introduced the use of attributes. We chose to employ one attributesImpl, clearing it before each element. Multiple attributes for an element would simply be added by repeated calls to addAttribute.
A Lesson
Before moving on, it is worth noting that after all this work, there is already a generator provided with Cocoon which does much of what we have accomplished here - org.apache.cocoon.generation.RequestGenerator which in the default configuration is probably available at http://localhost:8080/cocoon/request
Moving On
From here, we will move on to cover handling ugly pseudo-xml (like real world html) with CDATA blocks, employing some of the Avalon lifecycle method callbacks (Composable/Disposable), Database access, and Caching.
The Employee SQL Example Reworked
In the samples included with Cocoon, there is an example of a SQL query using XSP and ESQL. We will recreate part of that example below using the same HSQL database, which should be automatically configured and populated with data in the default build. If you find that you do not have that database set up, see the ESQL XSP sample for instructions on setting the datasource up. Do note that this specific task is handled in the ESQL XSP example in just a few lines of code. If your task is really this simple, there may be no need to create your own generator.
import java.io.Serializable; import java.sql.Connection; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.Date; import java.util.Map; import org.apache.avalon.excalibur.datasource.DataSourceComponent; import org.apache.avalon.framework.activity.Disposable; import org.apache.avalon.framework.component.ComponentException; import org.apache.avalon.framework.component.ComponentSelector; import org.apache.avalon.framework.parameters.Parameters; import org.apache.avalon.framework.service.ServiceException; import org.apache.avalon.framework.service.ServiceManager; import org.apache.cocoon.ProcessingException; import org.apache.cocoon.caching.CacheableProcessingComponent; import org.apache.cocoon.environment.SourceResolver; import org.apache.cocoon.generation.ServiceableGenerator; import org.apache.excalibur.source.SourceValidity; import org.xml.sax.SAXException; import org.xml.sax.helpers.AttributesImpl; public class EmployeeGeneratorExample extends ServiceableGenerator implements CacheableProcessingComponent, Disposable { public void dispose() { super.dispose(); manager.release(datasource); datasource = null; } public void recycle() { myAttr.clear(); super.recycle(); } public void setup(SourceResolver resolver, Map objectModel, String src, Parameters par) { // Not neeed for this example, but you would get request // and/or sitemap parameters here. } public void service(ServiceManager manager) throws ServiceException { super.service(manager); ComponentSelector selector = (ComponentSelector) manager .lookup(DataSourceComponent.ROLE + "Selector"); try { this.datasource = (DataSourceComponent) selector .select("personnel"); } catch (ComponentException e) { throw new ServiceException("personnel", "Could not find datasource.", e); } } public void generate() throws SAXException, ProcessingException { try { Connection conn = this.datasource.getConnection(); Statement stmt = conn.createStatement(); ResultSet res = stmt.executeQuery(EMPLOYEE_QUERY); //open the SAX event stream contentHandler.startDocument(); myAttr .addAttribute("", "date", "date", "", (new Date()) .toString()); //open root element contentHandler.startElement("", "content", "content", myAttr); String currentDept = ""; boolean isFirstRow = true; boolean moreRowsExist = res.next() ? true : false; while (moreRowsExist) { String thisDept = attrFromDB(res, "name"); if (!thisDept.equals(currentDept)) { newDept(res, thisDept, isFirstRow); currentDept = thisDept; } addEmployee(res, attrFromDB(res, "id"), attrFromDB(res, "empName")); isFirstRow = false; if (!res.next()) { endDept(); moreRowsExist = false; } } //close root element contentHandler.endElement("", "content", "content"); //close the SAX event stream contentHandler.endDocument(); res.close(); stmt.close(); conn.close(); } catch (SQLException e) { throw new ProcessingException(e); } } public Serializable getKey() { // Default non-caching behaviour. We could implement this later. return null; } public SourceValidity getValidity() { // Default non-caching behaviour. We could implement this later. return null; } private DataSourceComponent datasource; private AttributesImpl myAttr = new AttributesImpl(); private String EMPLOYEE_QUERY = "SELECT department.name, employee.id, employee.name as empName " + "FROM department, employee " + "WHERE department.id = employee.department_id ORDER BY department.name"; private void endDept() throws SAXException { contentHandler.endElement("", "dept", "dept"); } private void newDept(ResultSet res, String dept, boolean isFirstRow) throws SAXException { if (!isFirstRow) { endDept(); } myAttr.clear(); myAttr.addAttribute("", "name", "name", "", dept); contentHandler.startElement("", "dept", "dept", myAttr); } private void addEmployee(ResultSet res, String id, String name) throws SAXException { myAttr.clear(); myAttr.addAttribute("", "id", "id", "", id); contentHandler.startElement("", "employee", "employee", myAttr); contentHandler.characters(name.toCharArray(), 0, name.length()); contentHandler.endElement("", "employee", "employee"); } private String attrFromDB(ResultSet res, String column) throws SQLException { String value = res.getString(column); return (res.wasNull()) ? "" : value; } }
Compile and Test
To compile this, you will now need the following on your classpath: avalon-framework.jar, excalibur-pool-api.jar, excalibur-datasource.jar, and excalibur-sourceresolve.jar (using whatever names they have in your distribution). When you compile this, you may receive some deprecation warnings. Do not worry about them - we will discuss that later.
To test it, copy it over to your WEB-INF\classes\ directory as before and add something like the following to your sitemap.xmap ...
... <map:generator name="employee" src="EmployeeGeneratorExample"/> ... <map:match pattern="employee.xml"> <map:generate type="employee"/> <map:serialize type="xml"/> </map:match> ...
New Concepts
Serviceable and Disposable
We've implemented the Avalon lifecycle interfaces Serviceable and Disposable. When Cocoon starts up (which happens when the servlet container starts up) the ServiceManager will call service(ServiceManager m) for our component as it works its way through all the components declared in the sitemap. The handle to ServiceManager is used to look up any other Avalon components that we need. Lookups happen in an abstracted way using a ROLE which enables us to change out implementations of each component without affecting previously written code. Our generator's ROLE by the way was defined in the Generator interface.
Similarly, when this instance of our generator is disposed of by the container, it will call the dispose() method to allow us to clean up any resources we held on to between invocations. Note that components can be pooled by the container. If we thought that our employee generator was going to see a lot of traffic, we might change its definition at the top of sitemap.xmap to include attributes like pool-max="16" so that multiple overlapping requests could be serviced without a log jam.
Datasource
We look up our HSQL database here by its name given in cocoon.xconf. If we had multiple datasources (say a backup development database and a live one), we could determine which one to use based on a simple configuration parameter in sitemap.xmap. We could get at configuration parameters using the Avalon interface Configurable.
Caching
To get started implementing Caching, first read the Caching concepts documentation. Basically, we would replace our versions of getKey() and getValidity() to return a non-null result.
Briefly, the key is any Serializable object that uniquely identifies the result within the scope of this component (our Generator). In our case, since we are not returning different results based on request parameters, etc. all results from our Generator will be the same, given the same database state. We don't need to make this key globally unique, just unique to us. So, we could implement getKey() as follows:
... public Serializable getKey() { return "doesn't matter here"; }
The trouble in our example comes when we go to implement getValidity(). This is meant to return an org.apache.excalibur.source.SourceValidity which is responsible for informing the Cocoon pipeline internals whether or not a cached response found at some point in the future is still valid (should be served) or not. In the case of a file system resource this would be easy to determine, for instance. In fact, there is a SourceValidity implementation built already for this case ( org.apache.excalibur.source.impl.FileTimeStampValidity).
In the case of database information, one option would be to devise some system for tracking last-modified times on a per-row or per-table basis. We would still need to query the database for each request to determine if our cached version is still useful. Another option (by far the most common in my experience) would be to decide on some time-period during which we will serve the same result whether or not the database has changed in the meantime. To do this, people will usually pick some length of time long enough to realize benefit from caching, but short enough to minimize the appearance of outdated data. There are pre-built options for this as well (org.apache.excalibur.source.impl.ExpiresValidity for example). To keep our results around for five minutes we could implement:
... public SourceValidity getValidity() { // valid for 600 seconds return new ExpiresValidity(1000*600); }
A third more intruiging option is to utilize some event-based system to signal that a given result is invalid when the database (or table or row) is actually updated, and not before. This has the benefit of greatest cacheability, and least chance for outdated data, with the disadvantage that it is somewhat complex. The "eventcache" block in Cocoon (see also the "jms" block) is intended to provide a framework to apply this model in Cocoon pipelines. If your database implements triggers and stored procedures, and enables interaction with JMS or http calls, this approach may worth considering.
Conclusion
We have covered a lot of ground which should provide a great head start in implementing not only your own Generators, but any Transformers or Serializers you may need as well. Of course, you should also now have an improved insight into some of the workings of the Cocoon core, and its interaction with the Avalon-based framework which would have application for implementing any component whether for interaction with the sitemap, or more general use in your application. As is always the case in Open Source Software, you will find the best examples and insight by investigating the existing sources, both those in Cocoon and in its Avalon and Excalibur counterparts as well. Armed with this knowledge you should also be able to ask more educated and focused questions on the public mailing lists which you may find will yield better and faster responses.
Errors and Improvements? If you see any errors or potential improvements in this document please help us: View, Edit or comment on the latest development version (registration required).