Chaperon - Text grammar format
http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Main

How-Tos
Index

Chaperon Parser
Introduction
XML grammar format
Text grammar format

Text Grammar Format
Structure

The text grammar consists of two parts. The first part contains the token definitions and special instruction declarations. The other part contains the productions.

[tokens] 
[special instructions]

%start "Symbol of the production" ;

%%
[productions]

The declaration "%start" declares the root production for the result document.

Lexical tokens

The tokens are similar to the tokens of the XML grammar. For token definition the text grammar makes use of regular expressions

%token WORD [A-Za-z][a-z]* ;
Alternations

Alternation means that one of the contained elements must match.

%token CHAR [A-Za-z] | [0-9] ;
Concatenations

Concatenation means that all elements in a sequence must match.

%token IDENTIFIER [A-Za-z] [A-Za-z0-9_]* ;
Character classes

A character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a negated character class. The negated character class implies that the character should not match.

%token PUNCTUATION [\.,\;\?!] ;
%token NOTNUMBER [^0-9] ;
Universal character

This character matches all characters except carriage return and line feed

%token COMMENT // .* ;
Begin of line

This symbol matches the beginning of a line

%token NOTE ^ \[ [0-9]+ \] ;
End of line

This symbol matches the end of a line

%token BREAK \\ \\ $ ;
Abbreviations

If an regular expression is often used, you can use an abbreviation for it

%ab NUMBER [0-9] ;
%token FLOAT <NUMBER>+ \. <NUMBER>+ ;
%token INT <NUMBER>+ ;
Comments and Whitespaces

These are two special tokens which can appear in any position in the parsed text. The parser will read the tokens and then disgard them.

%ignore whitespace [\n\r\ ];
%ignore comment // .* ;
Productions

The productions are similarly handled to the productions in the XML grammar. More than one definition can be declared through an alternation

[Symbol of the production] : [Symbol1] [Symbol2] [..]
                           | [Symbol1] [..]
                           ;

To set the precedence for the production use "%prec"

example : WORD float %prec PLUS
        | WORD
        ;

And for the reduce type use "%reducetype

list : list line %reducetype APPEND
     | line      %reducetype APPEND
     ;
Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.