Chaperon - Text grammar format

This project has retired. For details please refer to its Attic page.

How-Tos

Chaperon Parser

Text grammar format

Text Grammar Format

Structure

The text grammar consists of two parts. The first part contains the token definitions and special instruction declarations. The other part contains the productions.


	[tokens] [special instructions] %start "Symbol of the production" ; %% [productions]

The declaration "%start" declares the root production for the result document.

Lexical tokens

The tokens are similar to the tokens of the XML grammar. For token definition the text grammar makes use of regular expressions


	%token WORD [A-Za-z][a-z]* ;

Alternations

Alternation means that one of the contained elements must match.


	%token CHAR [A-Za-z] \| [0-9] ;

Concatenations

Concatenation means that all elements in a sequence must match.


	%token IDENTIFIER [A-Za-z] [A-Za-z0-9_]* ;

Character classes

A character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a negated character class. The negated character class implies that the character should not match.


	%token PUNCTUATION [\.,\;\?!] ; %token NOTNUMBER [^0-9] ;

Universal character

This character matches all characters except carriage return and line feed


	%token COMMENT // .* ;

Begin of line

This symbol matches the beginning of a line


	%token NOTE ^ \[ [0-9]+ \] ;

End of line

This symbol matches the end of a line


	%token BREAK \\ \\ $ ;

Abbreviations

If an regular expression is often used, you can use an abbreviation for it


	%ab NUMBER [0-9] ; %token FLOAT <NUMBER>+ \. <NUMBER>+ ; %token INT <NUMBER>+ ;

Comments and Whitespaces

These are two special tokens which can appear in any position in the parsed text. The parser will read the tokens and then disgard them.


	%ignore whitespace [\n\r\ ]; %ignore comment // .* ;

Productions

The productions are similarly handled to the productions in the XML grammar. More than one definition can be declared through an alternation


	[Symbol of the production] : [Symbol1] [Symbol2] [..] \| [Symbol1] [..] ;

To set the precedence for the production use "%prec"


	example : WORD float %prec PLUS \| WORD ;

And for the reduce type use "%reducetype


	list : list line %reducetype APPEND \| line %reducetype APPEND ;