Technologies Used

Since many people do not seem to understand the big picture of the technologies used by Cocoon, I will try to explain my vision of them. I will also provide some information that I hope will enable you to jump right in, help with its development, or show your boss how much money can be saved using Cocoon.

The first thing you must understand is that XML is not a language (like HTML), but a syntax, in the same way that ASCII defines a standard way to map characters to bytes rather than to character strings.

XML is usually referred to as portable data in the sense that its parsing is application independent. The same XML parser can read every possible XML document: one describing your bank account, another describing your favorite Italian meal, etc. This is, as you all know, impossible with other text-based or binary file formats. A near-equivalent in the old days was CSV (comma separated values) files, which used a very simple syntax (one record per line, a comma separating fields, and the values in the first row naming the columns). XML, unlike CSV, is much more flexible and structured, even though it's much simpler than SGML.

A particular XML language is defined by its Document Type Definition (DTD). DTDs are described in the XML specification. They describe the syntax of a language implemented in XML. An XML document may be validated against a DTD (if present). If the validation is successful the document is said to be valid XML based on the particular DTD. If a DTD is not present and the parser does not encounter syntax errors parsing the file, the XML document is said to be well-formed. If errors are found, the document is not XML compliant.

So, any valid XML document is well-formed and an XML document valid for one particular DTD may not necessarily be valid for another DTD. For example, HTML is not an XML language because some tags such as <br> are not XML compliant. In XHTML, an XML compliant reformulation of HTML, <br>, for example, is replaced with <br/>. While HTML pages are not always well-formed XML documents (some pages might be), XHTML pages are always well-formed and valid XML documents if they match the XHTML DTD.

So much for the technical differences, but why was HTML not good enough? Let's consider an example.

These services could be web pages that serve up important information about an organization or the structure of the organization. It could be weather information or travel advisories. It could be contact information for people. Stock quotes. It could a book on how to grow the perfect Tomato.

So now we have all this information. Tons of it. Great! Now go and search all those web pages for specific content, like Author or Subject. Find me all abstracts of documents published on the subject of Big Tomatoes, since I only want to view abstracts to find the document best for me. An HTML page is not designed for this. It was designed for how to present the data.

When I look at a web page I might see that an author chose to make every heading bold with

<font
  size="+1">

. Yet if I look at another page I might notice that every heading was marked up with <H1>. Yet another page may use tables and table headers to format the data. Find me every document that has the word potato in the first heading.

Suppose I have a web application that serves up weather information for different parts of the country. Let's say you live in Boston, MA and only want the local weather. Your boss asks you to write an application that goes out and grabs the two-to-three sentence weather summary from my application and display it on your intranet's homepage.

You take a quick jaunt over to my weather application and notice that the summary is in what looks like the second paragraph of the page. So you take a quick peek at the HTML source that my weather application returns. You suddenly realize that it's all on one line and is buried deep within tables.

So you start writing your little application to parse my HTML code to retrieve only the information you were looking for. You pat yourself on the back when�4 hours later�you finally get the information you were looking for. Your code looks for the 2nd TABLE, the 6th TR, and then the 2nd TD. Phew. Your application, which really only wants to retrieve weather data, is forced to parse display markup to get it.

You run over to your boss and demonstrate the application you are so proud of writing. Lo and behold it doesn't work. What happened? The good old page author decided to change the layout and move the weather summary to TABLE 1, TR 1, TD 1. Your application breaks because it is tied to the presentation of the data and not to the data itself. Not very effective, since now your app will break every time the page author drinks too much coffee.

Then you notice something on the page that interests you. The site is automatically generated from XML and you see a link that indicates there is an XML DTD for weather information. And another link that indicates the availability of an XML stream for weather information. Yikes, would you look at that:

   <weather-information>
    <location>
     <city>Boston</city>
     <state>MA</state>
    </location>
    <summary>
     Beautiful and Sunny, lows 50, highs 65, with the
     chance of a blizzard and gail force winds.
    </summary>
   </weather-information>

So you simply download Cocoon and quickly write an XSL stylesheet that looks like the following:

   <xsl:stylesheet>
    <xsl:template match="/">
     ... presentation info here ...
    </xsl:template>
    <xsl:template 
      match="weather-information[location/city = 'Boston']">
     <xsl:apply-templates select="summary"/>
    </xsl:template>
   </xsl:stylesheet>

As the above example explains, HTML is a language for describing graphics, behavior, and hyperlinks on web pages. HTML is not able to contextualize (i.e. give meaning to some text). For example, if you look for the title of a page, a nice HTML tag gives you that, but if you look for the author or version or something more specific like the author's mail address�even if this information is present in the text�you don't have a way to isolate it (contextualize it) from the surrounding information.

   <html>
    <head>
     <title>This is my article</title>
    </head>
    <body>
     <h1 align="center">This is my article</h1>
     <h3 align="center">
      by <a href="mailto:stefano@apache.org">
          Stefano Mazzocchi
         </a>
     </h3>
     ...
    </body>
   </html>

you don't have a guaranteed way to extract the mail address. Whereas in the following XML document


	<?xml version="1.0"?> <page> <title>This is my article</title> <author> <name>Stefano Mazzocchi</name> <mail>stefano@apache.org</mail> </author> ... </page>

We don't imagine XML overtaking HTML in web publishing since HTML is great for small needs. HTML was born as an SGML-based DTD for scientists' homepages, i.e. to parallelize and simplify the deployment and management of personal information. HTML was not designed for the publishing and processing of large quantities of data and complex dynamic information systems.

As you can see, XML alone is useless without some defined semantics: even if an application is able to parse a document, it must be able to understand what the markup means. This is why XML-only browsers are meaningless and not more useful than text editors from a usability point of view.

This is one of the reasons why XSL (the eXtensible Stylesheet Language) was proposed and designed. XSL is divided into two parts: transformation (XSLT) and formatting objects (sometimes referred to as FO, XSL:FO, or simply XSL). Both are XML DTDs that define a particular XML syntax, so every XSL or XSLT document is a well-formed XML document.

XSLT is a language for transforming one well-formed XML document into something else (which may not necessarily be another XML document, although it most often will be). This means that you can use it to go from one DTD to another in a procedural way that is defined inside your XSLT document. XSLT can be used in ways its name might not imply: a transformation may be applied to a document to generate a graphical description of its content. This is called styling, but, as you can imagine, it is just one of the possible uses of transformation technology.

Back in the earlier example, the HTML file may have been generated from an XML file using another XML file as a transformation sheet (which in this case is a stylesheet). The data is all there: we just have to tell the transformer how to come up with the HTML document once all the data is parsed.

Usually, transformation sheets work from one DTD to another and in this way form a chain: transformA goes from DTD1 to DTD2 and transformB from DTD2 to DTD3 or graphically


	DTD1 ---(transformA)--> DTD2 ---(transformB)---> DTD3

We'll call DTD1 the original DTD, DTD2 some intermediate DTD, DTD3 the final DTD. A transformation can always be created to go directly from DTD1 to DTD3, but this might be more complicated and less human-readable/manageable.

XSLFO is a language (an XML DTD) for describing 2D layout of text in both printed and digital media. I will not concentrate on the graphical abilities that formatting objects give you, but rather on the fact that it is mostly used as a final DTD, meaning that a transformation is used to generate a formatting object description of a document starting from a general XML file.

  <?xml version="1.0"?>
  <fo:root xmlns:fo="http://www.w3.org/XSL/Format/1.0">
   ...
   <fo:flow font-size="14pt" line-height="14pt">
    <fo:block 
        text-align="centered" 
        font-size="24pt" 
        line-height="28pt">This is my article</fo:block>
    <fo:block 
        space-before.optimum="12pt" 
        text-align="centered">by Stefano Mazzocchi</fo:block>
   </fo:flow>
  </fo:root>

which tells the formatting object formatter (the rendering engine), how to draw and place the text on screen or on paper.

XSL formatting objects and transformations are being specified by the same working group and have a lot of synergy, even though the XSLT specification also includes ways to create HTML and text from XML files.

The Cocoon publishing model is heavily based on the XSLT transformation capabilities. XSLT allows complete separation of content and style (something that is much harder to obtain with HTML, even using CSS2 or other styling technologies). But Cocoon goes further and defines a way of separating content and style from the programming logic that drives server side behavior.

The XSP language defines an XML DTD for separating content and logic for compiled server pages. XSP is, like XSLFO, supposed to be a final DTD. In fact, XSP is rendered into source code and then compiled into binary code. This allows performance increase since no parsing or interpretation overhead happens at runtime.

In dynamic content generation technology, content and logic are combined: in every page there is a mix of static content and dynamic logic that work together to create the final result, usually using run-time or time-dependent input. XSP is no exception, since it defines a syntax to mix static content and programmatic logic in a way that is independent of both the programming language used and the binary results that the final source-rendering generated.

But it must be understood that XSP is just a piece of the framework: exactly like how formatting objects mix style and content, XSP objects mix logic and content. On the other hand, since both are XML DTDs, XSLT can be used to move from pure content to these final DTDs, placing the style and logic on the transformation layers and guaranteeing complete separation and easier maintenance.