Dynamic XML in Cocoon

This project has retired. For details please refer to its Attic page.

Dynamic XML

Introduction

Web publishing is very limited without the ability to create dynamic content. By dynamic XML we mean the content that is created as a function of request parameters or state of the requested resource. For this reason, a lot of work and design has been put into Cocoon to allow dynamic XML content to be generated.

The Servlet/JSP model

People are used to writing small Java programs to create their dynamic web content. Servlets, and Java in general, are very powerful, easy to write and fast to debug, but they impose (like any other pure-logic solution) a significant management cost. This is due to the fact that programmable components like servlets must include both the logic to generate the dynamic code as well as all static elements (such as static content and style). The need for a more useful solution soon appeared.

To fill the gap between Java programmers and web engineers (groups that rarely overlap), Sun proposed the Java Server Pages (JSP) specification, a markup language (today with both SGML and XML syntax) that allows web engineers to include code in their pages, rather than include pages in their code. The impact of this strategy was significant: servlets were written directly in Java code if very little static content was to be used, otherwise JSP or other compiled server pages technologies were used.

This said, it would seem that using servlets/JSPs to create dynamic XML content would be the perfect choice. Unfortunately, design issues indicate that we should take a second look at the technology, and understand why this isn't so.

Servlet Chaining Vs. Servlet Nesting

Java Servlets were introduced by the Java Web Server team as a way to allow users to create their own web plug-ins. They were designed to handle the HTTP protocol and all possible dynamic web content (including HTML, XML, images, etc. - both text and binary streams). Unfortunately, the need for a componentized request handler was not taken into serious consideration in the design phase but only later, when at an implementation phase.

In fact, the Java Web Server provided the ability to chain multiple servlets, one becoming the filter of the other. Unfortunately, since the API doesn't include such a possibility in its design, such a servlet chain is very limited in its behavior and puts significant restrictions on the API use. Something that forced the Servlet API architects to come up with better solutions.

The solution was servlet nesting: the ability for a servlet to include another servlet's output inside its own transparently. This allowed programmers to separate different logic on different servlets.

Unfortunately servlet nesting does not allow you to easily "pass through" information about sessions and cookies, and it basically breaks the HTTP model because it invokes the second servlet from the wrong client (the server). So servlet nesting is just a temporary hack, useful for simple tasks, but not workable in general.

The next solution proposed was Request Dispatching. A servlet could alter the request object (but not the response!) and then forward the request to another servlet (a kind of internal redirect, invisible to the client).

The limitations of the Servlet Interface

However, the problem with all these approaches was very simple: the servlet interface was never designed to be a general-purpose object-oriented processing interface. Textual strings, returned from servlets, are the least time-efficient way of representing XML; DOM objects, used in Cocoon 1, are more time-efficient (and SAX events, used in Cocoon 2, are even more time-efficient).

Additionally, if you then want to start using your reusable XML-generating or XML-transforming servlet in a non servlet environment (for example a standalone desktop application) you have to somehow "emulate" the functionality of the servlet engine. This is silly - interfaces should not be unnecessarily complex. Cocoon follows the principle, If you don't need to write something as a servlet, don't!

The Cocoon Model

Rather than turning Cocoon into a servlet engine, thus limiting its portability, this document outlines some solutions that allow Cocoon users to get the servlet-equivalent functionality with internal Cocoon design ideas.

The Cocoon processing model is based on the separation of

Production -
where XML content is generated based on Request parameters (servlet equivalent)
Processing -
where the produced XML content is transformed/evaluated
Formatting -
where the XML content is finally formatted into the desired output format for client use.

This separation of working contexts allows Cocoon users to implement their own internal modules to add the functionality they require to the whole publishing system. In fact, while a few of these components are already shipped with Cocoon, the highly modular structure allows you to build your own to fit your particular needs.

Writing Producers

Producers initiate the request handling phase. They are responsible for evaluating the HttpServletRequest parameters provided and create XML content that is fed into the processing reactor. Servlet logic should be translated into a producer if the request parameters can be used directly to generate the XML content (for example the FileProducer which loads the requested file from disk).

Here follows the code for an example producer distributed with Cocoon:

Of course, this uses a string which is not as fast as building a DOM object directly, as mentioned above, but we wanted to keep the example really simple, and in a page this small the speed difference would not be noticeable anyway.

public class DummyProducer
  extends AbstractProducer
  implements Status
{

  String dummy = "<?xml version=\"1.0\"?>"
      + "<?cocoon-format type=\"text/html\"?>"
      + "<html><body>"
      + "<h1 align=\"center\">"
          + "Hello from a dummy page"
      + "</h1>"
      + "</body></html>";

  public Reader getStream(HttpServletRequest request)
    throws IOException
  {
    return new StringReader(dummy);
  }

  public String getPath(HttpServletRequest request) {
    return "";
  }

  public String getStatus() {
    return "Dummy Producer";
  }
}

The key method is getStream() which is responsible for processing the given servlet request and provide a Reader for reading the generated XML document.

Note that AbstractProducer has also another method, getDocument(request), which is responsible for directly returning a DOM tree. In case you need to render your servlet code Cocoon-aware, the above example should tell you what to do.

Please look at the shipped producers' source code for example code and look at the user guide for how to install and use your own producers.

Writing Processors

If your servlet needs many parameters to work, it is more reasonable that you write a Processor instead. A Processor transforms a given XML document (which, in this case should contain the needed static parameters) into something else, driven both by the input document and by the request object which is also available.

Here is a simple processor example that should show you what the above means. Suppose you have the following document as input (note that it may have been produced from a file, from other sources or dynamically - see the above paragraph):


	<?xml version="1.0"?> <page> <p>Current time is <time/></p> </page>

Our simple example processor will look for the <time/> tags and will expand them to the current local time, creating this result document:


	<?xml version="1.0"?> <page> <p>Current time is 6:48PM</p> </page>

Please look at the shipped processors' source code for example code and look at the user guide for how to install and use your own processors.

Using Cocoon processors

The above example shows a very simple situation but needs non-trivial code to implement it. For this reason, the Cocoon distribution includes a number of processors that implement common needs and situations. These are:

The XSLT processor -
Applies XSLT transformations to the input document. XSLT allows you to solve your transformation needs as well as simple tag evaluation/processing due to its extensible and programmable nature. XSLT is a W3C Recommendation.
The XSP processor -
Evaluates XSP pages and compiles them into producers. This processor allows you include programmatic logic into your pages as well as to separate the logic from the content. See the XSP user guide for more information. Note that the XSP Processor assumes that it is getting its input from a static file, so it will not work well with pre-processing. Its design means that it should really have been a Producer in the first place, instead of a Processor. This change has been made in Cocoon 2.
The SQL processor (Deprecated) -
Evaluates simple tags describing SQL queries to JDBC drivers and formats their result-set in XML depending on given parameters. See the SQL processor user guide for more information. Note: This is deprecated - users are advised to use the XSP SQL taglib, or the more powerful Extended SQL taglib instead. The latter (ESQL taglib) allows easy post-processing of output within XSP, amongst other things, whilst the former taglib is mainly provided for backward compatability.
The LDAP processor -
the LDAP processor that evaluates simple tags describing LDAP queries to directory services and formats their result-set in XML depending on given parameters. See the LDAP processor user guide for more information.