Get started
Grammars and edition
API Documentation
Release Notes
In the previous tutorial, we wrote a grammar for simple mathematic expressions, compiled it with the himecc tool to generate the corresponding parser and used the result to parse an expression. The produced Abstract Syntax Tree (AST) corresponded exactly to the grammar, but had some shortcomings. It contained numerous nodes for the variables that may hinder the further analysis of the AST. Let’s take a slightly more complex example. At this point, parsing the (2+3)*4
expression would produce the following AST:
exp +-> exp_term +-> exp_factor +-> exp_factor | +-> exp_atom | +-> ( = ( | +-> exp | | +-> exp_term | | +-> exp_term | | | +-> exp_factor | | | +-> exp_atom | | | +-> NUMBER = 2 | | +-> + = + | | +-> exp_factor | | +-> exp_atom | | +-> NUMBER = 3 | +-> ) = ) +-> * = * +-> exp_atom +-> NUMBER = 4
This AST contains all the nodes that correspond to the variables in the grammar, e.g. exp_factor
, etc. However, it is hard to see the structure of the mathematical expression in this AST. What we really want to have an AST that only contains the intended semantic of the mathematical expression. It would be much simpler to have an AST with a shape much closer to the conceptual arithmetic expression, such as:
* = * +-> + = + | +-> NUMBER = 2 | +-> NUMBER = 3 +-> NUMBER = 4
In this tutorial, we will extend our previous specification of mathematic expression to include tree actions, i.e. actions that will be applied to the produced AST in order to produce a more meaningful one and prune the unnecessary details in order to produce to the latter AST.
The first and simplest tree action is the Drop
action. It specifies that the node marked with this action, and all its children should be completely removed from the AST. For example, let’s say we do not want to see the tokens corresponding to the brackets in the final AST. What we have to do is to mark the corresponding two nodes with Drop actions. To do so, we append a special operator in the grammar rules. The exp_atom
rules then become as follow:
exp_atom -> NUMBER
| '('! exp ')'! ;
The terminals for the brackets are followed by the!
character, which means that the AST node corresponding to them will be applied theDrop
action.
The Promote
action is used to mark a node in the AST that should replace its parent. The node itself is removed from its collection of children and takes the place of its parent. If the node to be promoted has children, then they are put in the original place of the marked node. The Promote
action is specified in the grammar rules using the ^
symbol. It is placed behind the element that will correspond to the AST node that should be promoted. For example, we further amend theexp_atom
rule as follow:
exp_atom -> NUMBER^
| '('! exp^ ')'! ;
This rule has two alternatives, separated with the |
character. Now, we applied the Promote
action using the ^
operator. In the first alternative we specify that the node corresponding to the NUMBER
token will replace its parent, i.e. the node corresponding to the exp_atom variable. Furthermore, in the second alternative, we specify that not only should the brackets be discarded, but that the node corresponding to the exp element should replace the parent. We modify the rest of the rules in the same fashion as follow:
exp_factor -> exp_atom^
| exp_factor '*'^ exp_atom
| exp_factor '/'^ exp_atom ;
exp_term -> exp_factor^
| exp_term '+'^ exp_factor
| exp_term '-'^ exp_factor ;
exp -> exp_term^ ;
In this example, all the nodes for the mathematical operators have to be promoted to replace their respective parents. Furthermore, for the alternatives where the expression at one lever defers to the rule as the level below, we also specify that the node from the lower level should replace the node at the upper level. It is important to note that the Promote
actions will chain so that a single node can be promoted multiple times. For example, the NUMBER
node produced will be promoted in the rule
exp_atom -> NUMBER^;
Then as it replaces its parent, it can also be promoted by the rule hereafter and so forth
exp_factor -> exp_atom^;
With the modifications prescribed above, the final grammar with all the tree actions should look like this:
grammar MathExp
{
options
{
Axiom = "exp";
Separator = "SEPARATOR";
}
terminals
{
WHITE_SPACE -> U+0020 | U+0009 | U+000B | U+000C ;
SEPARATOR -> WHITE_SPACE+;
INTEGER -> [1-9] [0-9]* | '0' ;
REAL -> INTEGER? '.' INTEGER (('e' | 'E') ('+' | '-')? INTEGER)?
| INTEGER ('e' | 'E') ('+' | '-')? INTEGER ;
NUMBER -> INTEGER | REAL ;
}
rules
{
exp_atom -> NUMBER^
| '('! exp^ ')'! ;
exp_factor -> exp_atom^
| exp_factor '*'^ exp_atom
| exp_factor '/'^ exp_atom ;
exp_term -> exp_factor^
| exp_term '+'^ exp_factor
| exp_term '-'^ exp_factor ;
exp -> exp_term^ ;
}
}
Now let's test the result.
Regenerate the lexer and parser for the new grammar:
himecc.bat MathExp.gram
or explicitly withnet461/himecc.exe MathExp.gram
./himecc MathExp.gram
dotnet netcore20/himecc.dll MathExp.gram
mono net461/himecc.exe MathExp.gram
Replace the old files in your test project by the newly generated ones.
Then rebuild the test project and run.
msbuild
bin/Debug/net461/TestHime.exe
, or with Monomono bin/Debug/net461/TestHime.exe
dotnet build
dotnet bin/Debug/netcoreapp2.0/TestHime.dll
The output of the program should be the simplified AST, the same as shown at the beginning of this tutorial:
* = * +-> + = + | +-> NUMBER = 2 | +-> NUMBER = 3 +-> NUMBER = 4
In this tutorial we managed to simplify the AST produced by the parser so that it is closer to the intended semantics and therefore easier to use and less cluttered. To achieve this we used simple tree actions; the Drop
action to remove sub-trees and the Promote
action to modify the tree's hierarchy.
Now that we have a nice AST, it would be nice to use it for the evaluation for the parsed arithmetic expression. Go to the next tutorialto see how to use semantic actions for this purpose.