Chapter 2. Pipelines

2.1. What is a pipeline?

A Cocoon 3 pipeline expects one or more component(s). These components get linked with each other in the order they were added. There is no restriction on the content that flows through the pipeline.

A pipeline works based on two fundamental concepts:

  • The first component of a pipeline is of type org.apache.cocoon.pipeline.component.Starter. The last component is of type org.apache.cocoon.pipeline.component.Finisher.

  • In order to link components with each other, the first has to be a org.apache.cocoon.pipeline.component.Producer, the latter org.apache.cocoon.pipeline.component.Consumer.

When the pipeline links the components, it merely checks whether the above mentioned interfaces are present. So the pipeline does not know about the specific capabilities or the compatibility of the components. It is the responsibility of the Producer to decide whether a specific Consumer can be linked to it or not (that is, whether it can produce output in the desired format of the Consumer or not). It is also conceivable that a Producer is capable of accepting different types of Consumer and adjust the output format

2.1.1. Linear pipelines

A Cocoon 3 pipeline always goes through the same sequence of components to produce its output. There is no support for conditionals, loops, tees or alternative flows in the case of errors. The reason for this restriction is simplicity and that non-linear pipelines are more difficult (or even impossible) to be cached. In practice this means that a pipeline has to be contructed completely at build-time.

If non-linear XML pipes with runtime-support for conditionals, loops, tees and error-flows are a requirement for you, see the XProc standard of the W3C. There are several available implementations for it.

2.1.2. Pipelines by example

But let's get more specific by giving an example: Cocoon has become famous for its SAX pipelines that consist of exactly one SAX-based XML generator, zero, one or more SAX-based XML transformers and exactly one SAX-based XML serializer. Of course, these specific SAX-based XML pipelines can be build by using general Cocoon 3 pipelines: generators, transformers and serializers are pipeline components. A generator is a Starter and a Producer, a transformer can't be neither a Starter, nor a Finisher but is always a Producer and a Consumer and a serializer is a Consumer and a Finisher.

Here is some Java code that demonstrates how a pipeline can be utilized with SAX-based XML components:

Pipeline<SAXPipelineComponent> pipeline = new NonCachingPipeline<SAXPipelineComponent>();(1)
pipeline.addComponent(new XMLGenerator("<x></x>"));                                      (2)
pipeline.addComponent(new XSLTTransformer(this.getClass().getResource("/test1.xslt")));  (3)
pipeline.addComponent(new XSLTTransformer(this.getClass().getResource("/test2.xslt")));  (4)
pipeline.addComponent(new XMLSerializer());                                              (5)

pipeline.setup(System.out);                                                              (6)
pipeline.execute();                                                                      (7)
1

Create a NonCachingPipeline. It's the simplest available pipeline implementation. The org.apache.cocoon.pipeline.Pipeline interface doesn't impose any restrictions on the content that flows in it.

2

Add a generator, that implements the org.apache.cocoon.pipeline.component.PipelineComponent interface to the pipeline by using the pipeline's addComponent(pipelineComponent) interface.

The XMLGenerator expects a java.lang.String object and produces SAX events by using a SAX parser. Hence it has to implement the org.apache.cocoon.sax.component.SAXProducer interface.

The SAXProducer interface extends the org.apache.cocoon.pipeline.component.Producer interface. This means that it expects the next (or the same!) component to implement the org.apache.cocoon.pipeline.component.Consumer interface. The check that the next pipeline component is of type org.apache.cocoon.sax.component.SAXConsumer isn't done at interface level but by the implementation (see the org.apache.cocoon.sax.component.AbstractXMLProducer for details which the XMLGenerator is inherited from).

Since a generator is the first component of a pipeline, it also has to implement the Starter interface.

3

Add a transformer, that implements the org.apache.cocoon.pipeline.component.PipelineComponent interface, to the pipeline by using the pipeline's addComponent(pipelineComponent) method.

This XSLTTransformer expects the java.net.URL of an XSLT stylesheet. It uses the rules of the stylesheet to add, change or delete nodes of the XML SAX stream.

Since it implements the org.apache.cocoon.pipeline.component.Consumer interface, it fulfills the general contract that a Consumer is linked with a Producer. By implementing the org.apache.cocoon.sax.component.SAXConsumer interface, it fulfills the specific requirement of the previous XMLGenerator that expects a next pipeline component of that type.

This transformer also implements the org.apache.cocoon.sax.component.SAXProducer interface. This interface extends the org.apache.cocoon.pipeline.component.Producer interface which means that the next component has to be a org.apache.cocoon.pipeline.component.Consumer. Like the previous XMLGenerator, the XSLTTransformer inherits from the org.apache.cocoon.sax.component.AbstractXMLProducer which contains the check that the next component is of type org.apache.cocoon.sax.component.SAXConsumer.

4

Add another transformer to the pipeline. A pipeline can contain any number of components that implement the Producer and Consumer interfaces at the same time. However, they mustn't be neither of type Starter nor Finisher.

5

Add a serializer, that implements the org.apache.cocoon.pipeline.component.PipelineComponent interface to the pipeline by using the pipeline's addComponent(pipelineComponent) interface.

The XML serializer receives SAX events and serializes them into an java.io.OutputStream.

A serializer component is the last component of a pipeline and hence it has to implement the org.apache.cocoon.pipeline.Finisher interface.

Since it receives SAX events, it implements the org.apache.cocoon.pipeline.sax.SAXConsumer interface.

6

A pipeline has to be initialized first by calling its setup(outputStream) method. This method expects the output stream where the pipeline result should be streamed.

7

After the pipeline has been initialized, it can be executed by invoking its execute() method. The first pipeline component, a Starter, will be invoked which will trigger the next component and so on. Finally the last pipeline component, a Finisher will be reached which is responsible for the serialization of the pipeline content.

Once the pipeline has been started, it either succeeds or fails. There is no way to react on any (error) conditions.

Table 2.1. SAX components and their interfaces

Component typeStructural interfacesContent-specific interfaces  
SAX generatorStarter, Producer, PipelineComponentSAXProducer  
SAX transformerProducer, Consumer, PipelineComponentSAXProducer, SAXConsumer  
SAX serializerFinisher, Consumer, PipelineComponentSAXConsumer  

2.2. Pipeline implementations

TBW: noncaching, caching, async-caching, expires caching, own implementations

2.3. Embedding a pipeline

TBW: Passing parameters to the pipeline and its components, finsih() method

2.4. SAX components

concept, writing custom SAX components, link to Javadocs

2.4.1. Available components

Link to Javadocs

2.4.2. Writing custom components

2.4.2.1. SAX generator

explain from a user's point of view, what she needs to do to implement one (available abstract classes)

2.4.2.2. SAX transformer

explain from a user's point of view, what she needs to do to implement one

buffering

2.4.2.3. SAX serializer

explain from a user's point of view, what she needs to do to implement one

2.5. StAX components

StAX pipelines provide an alternative API for writing pipeline components. Altough they are not as fast as SAX, they provide easier state handling as the component can control when to pull the next events. This allows an implicit state rather than have to manage the state in the various content handler methods of SAX.

The most visible difference of StAX components in contrast to SAX is that the component itself has controls the parsing of the input whereas in SAX the parser controls the pipeline by calling the component. Our implementation of StAX pipelines uses just StAX interfaces for retrieving events - the writing interface is proprietary in order to avoid multihreading or continuations. So it is really a hybrid process - the StAX component is called to generate the next events, but it is also allowed to read as much data from the previous pipeline component as it wants. But as the produced events are kept in-memory until a later component pulls for them, the components should not emit large amounts of events during one invocation.

2.5.1. Available components

  • StAXGenerator is a Starter and normally parses a XML from an InputStream.

  • StAXSerializer is a Finisher and writes the StAX Events to an OutputStream.

  • AbstractStAXTransformer is the abstract base class for new transformers. It simplifies the task by providing a template method for generating the new events.

  • StAXCleaningTransformer is an transformer, which cleans the document from whitespaces and comments.

  • IncludeTransformer includes the contents of another document.

For further information refer to the javadoc

2.5.2. Writing custom components

2.5.2.1. StAX generator

The StAXGenerator is a Starter component and produces XMLEvents.

import java.io.InputStream;
import java.net.URL;

import javax.xml.stream.FactoryConfigurationError;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;

import org.apache.cocoon.pipeline.SetupException;
import org.apache.cocoon.pipeline.component.Starter;
public class MyStAXGenerator extends AbstractStAXProducer implements Starter {           (1)

   private XMLEventReader reader;
    
   public MyStAXGenerator(InputStream inputStream) {
      try {
         this.reader = XMLInputFactory.newInstance().createXMLEventReader(inputStream);  (2)
      } catch (XMLStreamException e) {
         throw new SetupException("Error during setup an XMLEventReader on the inputStream", e);
      } catch (FactoryConfigurationError e) {
         throw new SetupException("Error during setup the XMLInputFactory for creating an XMLEventReader", e);
      }
   }
    
   public void execute() {
      this.getConsumer().initiatePullProcessing();                                       (3)
   }

   public boolean hasNext() {
      return this.reader.hasNext();                                                      (4)
   }

   public XMLEvent nextEvent() throws XMLStreamException {
      return this.reader.nextEvent();                                                    (5)
   }
    
   public XMLEvent peek() throws XMLStreamException {
      return this.reader.peek();                                                         (6)
   }
	
}	
1

In order to implement an own StAXGenerator the easiest approach is to inherit from AbstractStAXProducer.

2

The constructor creates a new XMLEventReader for reading from the inputstream.

3

The pipeline is started using the execute method. As StAX is a pull based approach the last component has to start pulling.

4

This method should return true if the generator has a next Event.

5

Returns the next event from the generator.

6

Returns the next event from the generator, without moving actually to the next event.

2.5.2.2. StAX transformer

Implementing a StAX Transformer should be the most common use case. The AbstractStAXTransformer provides a foundation for new transformers. But in order to write new transformers even simpler, let's describe another feature first:

2.5.2.2.1. Navigator

Navigators allow an easier navigation in the XML document. They also simplify transformers, as usually transformers need only process some parts of the input document and the navigator helps to identify the interesting parts. There are several implementations already included:

  • FindStartElementNavigator finds the start tag with certain properties(name,attribute)

  • FindEndElementNavigator finds the end tag with certain properties(name,attribute)

  • FindCorrespondingStartEndElementPairNavigator finds both the start and the corresponding end tag.

  • InSubtreeNavigator finds whole subtrees, by specifying the properties of the "root" element.

For further information refer to the navigator javadoc

2.5.2.2.1.1. Using navigators

Using a navigator is a rather simple task. The transformer peeks or gets the next event and calls Navigator.fulfillsCriteria - if true is returned the transformer should be process that event somehow.

2.5.2.2.1.2. Implementing a navigator

Creating a new navigator is a rather simple task and just means implementing two methods:

import javax.xml.stream.events.XMLEvent;

public class MyNavigator implements Navigator {
   public boolean fulfillsCriteria(XMLEvent event) {                                     (1)
      return false;
   }
    
   public boolean isActive() {                                                           (2)
      return false;
   }
}
	
1

This method returns true if the event matches the criteria of the navigator.

2

Returns the result of the last invocation of fulfillsCriteria.

2.5.2.2.2. Implementing a transformer

The next example should show you an transformer featuring navigators and implicit state handling through function calls.

public class DaisyLinkRewriteTransformer extends AbstractStAXTransformer {
  @Override
   protected void produceEvents() throws XMLStreamException {
      while (this.getParent().hasNext()) {
         XMLEvent event = this.getParent().nextEvent();
         if (this.anchorNavigator.fulfillsCriteria(event)) {                             (1)
            ArrayList<XMLEvent> innerContent = new ArrayList<XMLEvent>();
            LinkInfo linkInfo = this.collectLinkInfo(innerContent);                      (2)
            if(linkInfo != null) {
               linkInfo.setNavigationPath(this.getAttributeValue(event.asStartElement(), (3)
                  PUBLISHER_NS,"navigationPath"));

               this.rewriteAttributesAndEmitEvent(event.asStartElement(),linkInfo);      (4)

               if(innerContent.size() != 0) {
                  this.addAllEventsToQueue(innerContent);
               }
            } 
            /* ... */
         } 
         /* ... */
      }
   }

   private LinkInfo collectLinkInfo(List<XMLEvent> events) throws XMLStreamException {
      Navigator linkInfoNavigator = new InSubtreeNavigator(LINK_INFO_EL);                (5)
      Navigator linkInfoPartNavigator = new FindStartElementNavigator(LINK_PART_INFO_EL);
      LinkInfo linkInfo = null;

      while (this.getParent().hasNext()) {
         XMLEvent event = this.getParent().peek();                                       (6)

         if (linkInfoNavigator.fulfillsCriteria(event)) {
            event = this.getParent().nextEvent();
            if (linkInfoPartNavigator.fulfillsCriteria(event)) {
               /* ... */
               String fileName = this.getAttributeValue(event.asStartElement(),"fileName");
               if (!"".equals(fileName)) {
                  linkInfo.setFileName(fileName);
               }
            } /* ... */
         } else if (event.isCharacters()) {
            events.add(this.getParent().nextEvent());
         } else {
            return linkInfo;
         }
      }
      return linkInfo;
   }

   private void rewriteAttributesAndEmitEvent(StartElement event, LinkInfo linkInfo) ;

}
1

The transformer checks for anchors in the XML.

2

If an anchor is found, it invokes a method which parses the link info if there is any. The additional array is for returning any events, which were read but do not belong to the linkinfo.

3

This method finally writes the start tag with the correct attributes taken from the parsed LinkInfo.

4

The events, which were read but not parsed, are finally added to the output of the transformer.

5

The parser for the linkInfo object uses itself also navigators ...

6

... and reads more events from the parent.

2.5.2.3. StAX serializer

The StAXSerializer pulls and serializes the StAX events from the pipeline.

public class NullSerializer extends AbstractStAXPipelineComponent 
   implements StAXConsumer, Finisher {
   
   private StAXProducer parent;                                                          (1)

   public void initiatePullProcessing() {                                                (2)
      try {
         while (this.parent.hasNext()) {
            XMLEvent event = this.parent.nextEvent();                                    (3)
            /* serialize Event */
         }
      } catch (XMLStreamException e) {
         throw new ProcessingException("Error during writing output elements.", e);
      }
   }

   public void setParent(StAXProducer parent) {                                          (4)
      this.parent = parent;
   }

   public String getContentType()  ;                                                     (5)
   public void setOutputStream(OutputStream outputStream) ;
}
1

The Finisher has to pull from the previous pipeline component..

2

In case of StAX the last pipeline component has to start pulling for Events.

3

The serializer pulls the next Event from the previous component and should as next step serialize it.

4

During the pipeline construction the setParent is called to set the previous component of the pipeline.

5

These two methods are defined in the Finisher and allow to set the OutputStream (if the Serializer needs any) and to retrieve the content-type of the result..

2.5.3. Using StAX and SAX components in the same pipeline

The StAX pipeline offers interoperability to SAX components to a certain degree. However due their different paradigms only two use cases are currently implemented: Wrapping a SAX component in a StAX pipeline and a StAX-to-SAX pipeline, which starts with StAX components and finishes with SAX.

2.5.3.1. Wrapping a SAX component in a StAX pipeline

This allows to use existing SAX components in a StAX pipeline. Beware the overhead of the conversion of StAX->SAX->StAX - so no performance gains from a SAX component can be expected.

Pipeline<StAXPipelineComponent> pipeStAX = new NonCachingPipeline<StAXPipelineComponent>((1));
pipeStAX.addComponent(new StAXGenerator(input));                                         (2)
pipeStAX.addComponent(new SAXForStAXPipelineWrapper(new CleaningTransformer()));         (3)
pipeStAX.addComponent(new StAXSerializer());
pipeStAX.setup(System.out);
pipeStAX.execute();
1

The pipeline uses a StAXGenerator - which produces StAX events.

2

In order to embed a single SAX component in a StAX pipeline, the SAXForStAXPipelineWrapper is needed. The constructor argument is the SAX component.

3

Altough the CleaningTransformer would emit SAX calls - the wrapper converts them back to the appropriate StAX events the StAXSerializer can write..

2.5.3.2. StAX-to-SAX pipeline

This converter allows to mix StAX and SAX components - but is limited to starting with StAX and then switching to SAX.

Pipeline<PipelineComponent> pipeStAX = new NonCachingPipeline<StAXPipelineComponent>();
pipeStAX.addComponent(new StAXGenerator(input));                                         (1)
pipeStAX.addComponent(new StAXToSAXPipelineAdapter());                                   (2)
pipeStAX.addComponent(new CleaningTransformer());                                        (3)
pipeStAX.addComponent(new XMLSerializer());                                              (4)
pipeStAX.setup(System.out);
pipeStAX.execute();
1

The pipeline starts with a StAXGenerator.

2

The adapter converts the StAX events to SAX method calls.

3

The CleaningTransformer is a SAX component.

4

The XMLSerializer writes the SAX method calls to a file.

2.5.4. Java 1.5 support

In order to use StAX with Java 1.5 an additional dependency is needed in the project's pom.xml.

<dependency>
  <groupId>org.codehaus.woodstox</groupId>
  <artifactId>wstx-asl</artifactId>
  <version>3.2.7</version>
</dependency>

Using woodstox is simpler, as the reference implementation depends on JAXP 1.4, which is not part of Java 1.5.

2.6. Utilities

TBW: XMLUtils, TransformUtils