Open-source image
open-source
Hime logo

Table of content

Lexical fragments

This page explains the use of lexical fragments in Hime 2.0.0 and up. The basic features of lexical rules is explained here. A lexical fragment is simply a lexical rule that can be used to build the definition of other terminals, but will not be matched by the lexer. This is useful when a terminal has a particularly complex definition that is best split into multiple rules but the individual parts shall not be matched themselves. For example consider:

INTEGER -> '0' | [1-9] [0-9]* ;
EXPONENT -> [eE] ('+'|'-')? INTEGER ;
REAL -> INTEGER? '.' INTEGER EXPONENT? | INTEGER EXPONENT ;

In this example, the INTEGER rule defines how to match a numeric integer in the decimal base. The EXPONENT rule then defines how to match the expression of an exponent. It reuses the definition of INTEGER. Finally, the REAL rule uses the two previous to define how to match the expression of a floating point number. The EXPONENT rule is very useful here because it simplifies the definition of REAL. However, as is, it can be matched by the associated lexer. Consider the following input:

x = e+1

With this input, the lexer would match the EXPONENT rule as a terminal on the e+1 part. However this is probably not what was intended. To prevent this, one can remove the EXPONENT rule and replace its usage by its definition. However, this would introduce additional complexity to the grammar. To still keep the definition of the EXPONENT rule and prevent its matching, one can now use the fragment keyword as a prefix:

fragment EXPONENT -> [eE] ('+'|'-')? INTEGER ;

This means that the EXPONENT rule is only a definition of a fragment of terminal that can be reused in the definition of other terminals; but it can never be matched by itself. The lexer will never produce an EXPONENT terminal.