Tutorial 2  Using Tree Actions
In the
previous tutorial, we wrote a grammar for simple mathematic expressions, compiled it
with the himecc tool to generate the corresponding parser and used the result to parse an
expression. The produced Abstract Syntax Tree (AST) corresponded exactly to the grammar,
but had some shortcomings. It contained numerous nodes for the variables that may hinder
the further analysis of the AST. Let’s take a slightly more complex example. At this point,
parsing the
(2+3)*4
expression would produce the following AST:
exp +> exp_term +> exp_factor +> exp_factor  +> exp_atom  +> ( = (  +> exp   +> exp_term   +> exp_term    +> exp_factor    +> exp_atom    +> NUMBER = 2   +> + = +   +> exp_factor   +> exp_atom   +> NUMBER = 3  +> ) = ) +> * = * +> exp_atom +> NUMBER = 4
This AST contains all the nodes that correspond to the variables in the grammar, e.g.
exp_factor
, etc. However, it is hard to see the structure of the mathematical expression in this AST.
What we really want to have an AST that only contains the intended semantic of the mathematical
expression. It would be much simpler to have an AST with a shape much closer to the conceptual
arithmetic expression, such as:
* = * +> + = +  +> NUMBER = 2  +> NUMBER = 3 +> NUMBER = 4
In this tutorial, we will extend our previous specification of mathematic expression to include tree actions, i.e. actions that will be applied to the produced AST in order to produce a more meaningful one and prune the unnecessary details in order to produce to the latter AST.
The Drop action
The first and simplest tree action is the
Drop
action. It specifies that the node marked with this action, and all its children should
be completely removed from the AST. For example, let’s say we do not want to see the tokens
corresponding to the brackets in the final AST. What we have to do is to mark the corresponding
two nodes with Drop actions. To do so, we append a special operator in the grammar rules.
The
exp_atom
rules then become as follow:
exp_atom > NUMBER
 '('! exp ')'! ;
The terminals for the brackets are followed by the
!
character, which means that the AST node corresponding to them will be applied the
Drop
action.
The Promote action
The
Promote
action is used to mark a node in the AST that should replace its parent. The node itself
is removed from its collection of children and takes the place of its parent. If the node
to be promoted has children, then they are put in the original place of the marked node.
The
Promote
action is specified in the grammar rules using the
^
symbol. It is placed behind the element that will correspond to the AST node that should
be promoted. For example, we further amend the
exp_atom
rule as follow:
exp_atom > NUMBER^
 '('! exp^ ')'! ;
This rule has two alternatives, separated with the

character. Now, we applied the
Promote
action using the
^
operator. In the first alternative we specify that the node corresponding to the
NUMBER
token will replace its parent, i.e. the node corresponding to the exp_atom variable. Furthermore,
in the second alternative, we specify that not only should the brackets be discarded, but
that the node corresponding to the exp element should replace the parent. We modify the rest
of the rules in the same fashion as follow:
exp_factor > exp_atom^
 exp_factor '*'^ exp_atom
 exp_factor '/'^ exp_atom ;
exp_term > exp_factor^
 exp_term '+'^ exp_factor
 exp_term ''^ exp_factor ;
exp > exp_term^ ;
In this example, all the nodes for the mathematical operators have to be promoted to replace their respective parents. Furthermore,
for the alternatives where the expression at one lever defers to the rule as the level below,
we also specify that the node from the lower level should replace the node at the upper level.
It is important to note that the
Promote
actions will chain so that a single node can be promoted multiple times. For example, the
NUMBER
node produced will be promoted in the rule
exp_atom > NUMBER^;
Then as it replaces its parent, it can also be promoted by the rule hereafter and so forth
exp_factor > exp_atom^;
Application
With the modifications prescribed above, the final grammar with all the tree actions should look like this:
grammar MathExp
{
options
{
Axiom = "exp";
Separator = "SEPARATOR";
}
terminals
{
WHITE_SPACE > U+0020  U+0009  U+000B  U+000C ;
SEPARATOR > WHITE_SPACE+;
INTEGER > [19] [09]*  '0' ;
REAL > INTEGER? '.' INTEGER (('e'  'E') ('+'  '')? INTEGER)?
 INTEGER ('e'  'E') ('+'  '')? INTEGER ;
NUMBER > INTEGER  REAL ;
}
rules
{
exp_atom > NUMBER^
 '('! exp^ ')'! ;
exp_factor > exp_atom^
 exp_factor '*'^ exp_atom
 exp_factor '/'^ exp_atom ;
exp_term > exp_factor^
 exp_term '+'^ exp_factor
 exp_term ''^ exp_factor ;
exp > exp_term^ ;
}
}
Now let's test the result.
Regenerate the lexer and parser for the new grammar:
 On Windows: himecc.bat MathExp.gram or explicitly with net461/himecc.exe MathExp.gram
 On Linux and MacOS: ./himecc MathExp.gram
 On any OS, explicitly with .Net Core: dotnet netcore20/himecc.dll MathExp.gram
 On any OS, explicitly with Mono: mono net461/himecc.exe MathExp.gram
Replace the old files in your test project by the newly generated ones.
Then rebuild the test project and run.

Using the .Net Framework or Mono:
msbuild
Then: bin/Debug/net461/TestHime.exe, or with Mono mono bin/Debug/net461/TestHime.exe 
Using .Net Core:
dotnet build
Then: dotnet bin/Debug/netcoreapp2.0/TestHime.dll
The output of the program should be the simplified AST, the same as shown at the beginning of this tutorial:
* = * +> + = +  +> NUMBER = 2  +> NUMBER = 3 +> NUMBER = 4
Conclusion
In this tutorial we managed to simplify the AST produced by the parser so that it is closer to the intended semantics and
therefore easier to use and less cluttered. To achieve this we used simple tree actions;
the
Drop
action to remove subtrees and the
Promote
action to modify the tree's hierarchy.
Now that we have a nice AST, it would be nice to use it for the evaluation for the parsed arithmetic expression. Go to the next tutorial to see how to use semantic actions for this purpose.