Text Grammar Format
Structure
The text grammar consists of two parts. The first part contains the token definitions and special
instruction declarations. The other part contains the productions.
| | |
|
[tokens]
[special instructions]
%start "Symbol of the production" ;
%%
[productions]
| |
| | |
The declaration "%start" declares the root production for the result document.
Lexical tokens
The tokens are similar to the tokens of the XML grammar. For token definition
the text grammar makes use of regular expressions
| | |
|
%token WORD [A-Za-z][a-z]* ;
| |
| | |
Alternations
Alternation means that one of the contained elements must match.
| | |
|
%token CHAR [A-Za-z] | [0-9] ;
| |
| | |
Concatenations
Concatenation means that all elements in a sequence must match.
| | |
|
%token IDENTIFIER [A-Za-z] [A-Za-z0-9_]* ;
| |
| | |
Character classes
A character class compares a character to the characters
which this class contains. There are two options for
a character class. Either a character class or a negated character class.
The negated character class implies that the character should not match.
| | |
|
%token PUNCTUATION [\.,\;\?!] ;
%token NOTNUMBER [^0-9] ;
| |
| | |
Universal character
This character matches all characters except carriage return and line feed
Begin of line
This symbol matches the beginning of a line
| | |
|
%token NOTE ^ \[ [0-9]+ \] ;
| |
| | |
End of line
This symbol matches the end of a line
Abbreviations
If an regular expression is often used, you can use an abbreviation for it
| | |
|
%ab NUMBER [0-9] ;
%token FLOAT <NUMBER>+ \. <NUMBER>+ ;
%token INT <NUMBER>+ ;
| |
| | |
Comments and Whitespaces
These are two special tokens which can appear in any position in
the parsed text. The parser will read the tokens and then disgard them.
| | |
|
%ignore whitespace [\n\r\ ];
%ignore comment // .* ;
| |
| | |
Productions
The productions are similarly handled to the productions in the XML grammar.
More than one definition can be declared through an alternation
| | |
|
[Symbol of the production] : [Symbol1] [Symbol2] [..]
| [Symbol1] [..]
;
| |
| | |
To set the precedence for the production use "%prec"
| | |
|
example : WORD float %prec PLUS
| WORD
;
| |
| | |
And for the reduce type use "%reducetype
| | |
|
list : list line %reducetype APPEND
| line %reducetype APPEND
;
| |
| | |
|