Get started
Grammars and edition
API Documentation
Release Notes
This page summarizes the basics of writing syntactic rules. These rules are used to define grammar variables and are written in the rules
section of a grammar.
All syntactic rules are written as:
myvar -> body ;
The interpretation of this rule is that it declares the myvar
grammar variable and specifies its definition with the associated context-free body. Note that all rules end with a semicolon (;
). An example of valid rule is:
stmt_while -> 'while' '(' expression ')' stmt_embedded ;
Syntactic rules do not have any particular orders. All variables defined in a grammar can be referred to in any syntactic rules.
In the body of a syntactic rule it is possible to refer to terminals and variables defined elsewhere in the grammar. Terminals referred to in the syntactic rules must be defined in the terminals
section of the grammar using a lexical rule (See Terminals). Example:
type_parameter -> IDENTIFIER ;
Inline terminals can be defined within syntactic rules. They are to be expressed within singles quotes. Inline terminals are restricted to simple text. For more complex cases, a lexical rule is required. For example:
stmt_while -> 'while' '(' expression ')' stmt_embedded ;
Virtual symbols are neither terminals nor variables. They are not matched by the lexer and parser. Instead, they are inserted by the parser where they are found in the syntactic rules. For example:
variable -> a b "my_virtual" c ;
Note that virtual symbols are written between double quotes. When a parser encounters this rule, it will insert the virtual symbol named "my_virtual"
. A corresponding node will be created in the produced AST. Virtual symbols can be used to insert indicators in the AST that can be leveraged during its later traversal by a semantic analyzer for example.
The following operators enable the specification of cardinality of its preceding element:
*
+
?
Examples:
namespace_declaration -> 'namespace' name_nmspce_type namespace_body ';'? ;
class_body -> '{' class_member* '}' ;
Internally, the parser generator will "flatten" the provided syntactic rules in order for them to be usable. This process should not affect how the rule are written and is transparent to the user; but this information is provided here anyway as a reference. Formally, the body of a syntactic rule can only be a sequence (possibly empty) of symbols, that is to say terminals and variables. The operators provided by the tool are great for expressivity but they cannot be used directly for the generation of a suitable parser. Below, _genvar
designates an automatically generated new variable.
a -> x? ;
// rewritten as
a -> x ;
a -> ;
a -> x y* ;
// rewritten as
_genvar -> y ;
_genvar -> _genvar y ;
a -> x ;
a -> x _genvar ;
a -> x y+ ;
// rewritten as
_genvar -> y ;
_genvar -> _genvar y ;
a -> x _genvar ;
The union operator |
marks an alternative. Example:
stmt_declaration -> stmt_declaration_var ';' | stmt_declaration_const ';' ;
Internally, the parser generator will "flatten" the provided syntactic rules in order for them to be usable. This process should not affect how the rule are written and is transparent to the user; but this information is provided here anyway as a reference. Formally, the body of a syntactic rule can only be a sequence (possibly empty) of symbols, that is to say terminals and variables. The operators provided by the tool are great for expressivity but they cannot be used directly for the generation of a suitable parser.
a -> x | y ;
// rewritten as:
a -> x ;
a -> y ;
Use brackets ()
to group elements in an expression:
stmt_declaration_var -> (type | 'var') var_declarators ;