Chaperon - Text grammar format



Chaperon Parser
XML grammar format
Text grammar format

Text Grammar Format

The text grammar consists of two parts. The first part contains the token definitions and special instruction declarations. The other part contains the productions.

[special instructions]

%start "Symbol of the production" ;


The declaration "%start" declares the root production for the result document.

Lexical tokens

The tokens are similar to the tokens of the XML grammar. For token definition the text grammar makes use of regular expressions

%token WORD [A-Za-z][a-z]* ;

Alternation means that one of the contained elements must match.

%token CHAR [A-Za-z] | [0-9] ;

Concatenation means that all elements in a sequence must match.

%token IDENTIFIER [A-Za-z] [A-Za-z0-9_]* ;
Character classes

A character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a negated character class. The negated character class implies that the character should not match.

%token PUNCTUATION [\.,\;\?!] ;
%token NOTNUMBER [^0-9] ;
Universal character

This character matches all characters except carriage return and line feed

%token COMMENT // .* ;
Begin of line

This symbol matches the beginning of a line

%token NOTE ^ \[ [0-9]+ \] ;
End of line

This symbol matches the end of a line

%token BREAK \\ \\ $ ;

If an regular expression is often used, you can use an abbreviation for it

%ab NUMBER [0-9] ;
%token FLOAT <NUMBER>+ \. <NUMBER>+ ;
%token INT <NUMBER>+ ;
Comments and Whitespaces

These are two special tokens which can appear in any position in the parsed text. The parser will read the tokens and then disgard them.

%ignore whitespace [\n\r\ ];
%ignore comment // .* ;

The productions are similarly handled to the productions in the XML grammar. More than one definition can be declared through an alternation

[Symbol of the production] : [Symbol1] [Symbol2] [..]
                           | [Symbol1] [..]

To set the precedence for the production use "%prec"

example : WORD float %prec PLUS
        | WORD

And for the reduce type use "%reducetype

list : list line %reducetype APPEND
     | line      %reducetype APPEND
Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.