Chaperon How-To
http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Main

How-Tos
Index

Chaperon Parser
Introduction
XML grammar format
Text grammar format

Introduction

Chaperon is a project, that helps to convert structured text to XML. It includes a strong LALR(1) parser to parse the text, and a tree builder, which creates an XML document.

What is structured text?

Examples of structured text are TeX files, java files, config files, etc.

Function

The Chaperon Parser consists of the following two components:

  • a parser table generator, and
  • a parser

The parser table generator generates a parser table from a grammar, like a compiler which generates byte code to improve the execution speed of parsing. The parser table generator does similar things. It makes the parsing process as fast as possible

The parser uses the parser table to parse text and then generate an XML document from it.

The generation of the parser table do the generator/transformer once as a first step, and stores the parser table into the persistent store.

If the grammar has change the parser creates new parser table.

Grammar

The parser can used similar as XML Parser. But instead of an XML parser the chaperon parser need a grammar file. This grammar file is also specified in XML.

The XML grammar is not really so handy, so the Chaperon project also provides a grammar for a text grammar similar to yacc/bison, and a stylesheet for converting this text grammar format to the XML grammar format.

So it is easier to write a grammar in this text format rather than directly in the XML format.

The grammar format, the XML and the text format, consists of two parts. The first part contains the token definitions and special instruction declarations. The other part contains the productions.

The token declarations were needed to build a lexer, which feeds the parser which tokens. The parser arrange the tokens greater aggregations, which help of the production definitions.

Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.