Open-source image
open-source
Hime logo

Table of content

Basics of Syntactic Rules

This page summarizes the basics of writing syntactic rules. These rules are used to define grammar variables and are written in the rules section of a grammar.

Rule expression

All syntactic rules are written as:

myvar -> body ;

The interpretation of this rule is that it declares the myvar grammar variable and specifies its definition with the associated context-free body. Note that all rules end with a semicolon (;). An example of valid rule is:

stmt_while -> 'while' '(' expression ')' stmt_embedded ;

Order

Syntactic rules do not have any particular orders. All variables defined in a grammar can be referred to in any syntactic rules.

Reference to terminals and variables

In the body of a syntactic rule it is possible to refer to terminals and variables defined elsewhere in the grammar. Terminals referred to in the syntactic rules must be defined in the terminals section of the grammar using a lexical rule (See Terminals). Example:

type_parameter -> IDENTIFIER ;

Inline terminals

Inline terminals can be defined within syntactic rules. They are to be expressed within singles quotes. Inline terminals are restricted to simple text. For more complex cases, a lexical rule is required. For example:

stmt_while -> 'while' '(' expression ')' stmt_embedded ;

Virtual symbols

Virtual symbols are neither terminals nor variables. They are not matched by the lexer and parser. Instead, they are inserted by the parser where they are found in the syntactic rules. For example:

variable -> a b "my_virtual" c ;

Note that virtual symbols are written between double quotes. When a parser encounters this rule, it will insert the virtual symbol named "my_virtual". A corresponding node will be created in the produced AST. Virtual symbols can be used to insert indicators in the AST that can be leveraged during its later traversal by a semantic analyzer for example.

Repetition operators

The following operators enable the specification of cardinality of its preceding element:

  • Zero or more: *
  • One or more: +
  • Optional (0 or 1 time): ?

Examples:

namespace_declaration -> 'namespace' name_nmspce_type namespace_body ';'? ;
class_body -> '{' class_member* '}' ;

Internally, the parser generator will "flatten" the provided syntactic rules in order for them to be usable. This process should not affect how the rule are written and is transparent to the user; but this information is provided here anyway as a reference. Formally, the body of a syntactic rule can only be a sequence (possibly empty) of symbols, that is to say terminals and variables. The operators provided by the tool are great for expressivity but they cannot be used directly for the generation of a suitable parser. Below, _genvar designates an automatically generated new variable.

a -> x? ;
// rewritten as
a -> x ;
a -> ;
a -> x y* ;
// rewritten as
_genvar -> y ;
_genvar -> _genvar y ;
a -> x ;
a -> x _genvar ;
a -> x y+ ;
// rewritten as
_genvar -> y ;
_genvar -> _genvar y ;
a -> x _genvar ;

Union

The union operator | marks an alternative. Example:

stmt_declaration -> stmt_declaration_var ';' | stmt_declaration_const ';' ;

Internally, the parser generator will "flatten" the provided syntactic rules in order for them to be usable. This process should not affect how the rule are written and is transparent to the user; but this information is provided here anyway as a reference. Formally, the body of a syntactic rule can only be a sequence (possibly empty) of symbols, that is to say terminals and variables. The operators provided by the tool are great for expressivity but they cannot be used directly for the generation of a suitable parser.

a -> x | y ;
// rewritten as:
a -> x ;
a -> y ;

Grouping

Use brackets () to group elements in an expression:

stmt_declaration_var -> (type | 'var') var_declarators ;