Tutorial 2 - Using Tree Actions

In the previous tutorial, we wrote a grammar for simple mathematic expressions, compiled it with the himecc tool to generate the corresponding parser and used the result to parse an expression. The produced Abstract Syntax Tree (AST) corresponded exactly to the grammar, but had some shortcomings. It contained numerous nodes for the variables that may hinder the further analysis of the AST. Let’s take a slightly more complex example. At this point, parsing the (2+3)*4 expression would produce the following AST:

exp
+-> exp_term
    +-> exp_factor
        +-> exp_factor
        |   +-> exp_atom
        |       +-> ( = (
        |       +-> exp
        |       |   +-> exp_term
        |       |       +-> exp_term
        |       |       |   +-> exp_factor
        |       |       |       +-> exp_atom
        |       |       |           +-> NUMBER = 2
        |       |       +-> + = +
        |       |       +-> exp_factor
        |       |           +-> exp_atom
        |       |               +-> NUMBER = 3
        |       +-> ) = )
        +-> * = *
        +-> exp_atom
            +-> NUMBER = 4

This AST contains all the nodes that correspond to the variables in the grammar, e.g. exp_factor, etc. However, it is hard to see the structure of the mathematical expression in this AST. What we really want to have an AST that only contains the intended semantic of the mathematical expression. It would be much simpler to have an AST with a shape much closer to the conceptual arithmetic expression, such as:

* = *
+-> + = +
|   +-> NUMBER = 2
|   +-> NUMBER = 3
+-> NUMBER = 4

In this tutorial, we will extend our previous specification of mathematic expression to include tree actions, i.e. actions that will be applied to the produced AST in order to produce a more meaningful one and prune the unnecessary details in order to produce to the latter AST.

The Drop action

The first and simplest tree action is the Drop action. It specifies that the node marked with this action, and all its children should be completely removed from the AST. For example, let’s say we do not want to see the tokens corresponding to the brackets in the final AST. What we have to do is to mark the corresponding two nodes with Drop actions. To do so, we append a special operator in the grammar rules. The exp_atom rules then become as follow:

exp_atom -> NUMBER
         | '('! exp ')'! ;

The terminals for the brackets are followed by the ! character, which means that the AST node corresponding to them will be applied the Drop action.

The Promote action

The Promote action is used to mark a node in the AST that should replace its parent. The node itself is removed from its collection of children and takes the place of its parent. If the node to be promoted has children, then they are put in the original place of the marked node. The Promote action is specified in the grammar rules using the ^ symbol. It is placed behind the element that will correspond to the AST node that should be promoted. For example, we further amend the exp_atom rule as follow:

exp_atom -> NUMBER
         | '('! exp ')'! ;

This rule has two alternatives, separated with the | character. Now, we applied the Promote action using the ^ operator. In the first alternative we specify that the node corresponding to the NUMBER token will replace its parent, i.e. the node corresponding to the exp_atom variable. Furthermore, in the second alternative, we specify that not only should the brackets be discarded, but that the node corresponding to the exp element should replace the parent. We modify the rest of the rules in the same fashion as follow:

exp_factor -> exp_atom^
           |  exp_factor '*'^ exp_atom
           |  exp_factor '/'^ exp_atom ;
exp_term   -> exp_factor^
           |  exp_term '+'^ exp_factor
           |  exp_term '-'^ exp_factor ;
exp        -> exp_term^ ;

In this example, all the nodes for the mathematical operators have to be promoted to replace their respective parents. Furthermore, for the alternatives where the expression at one lever defers to the rule as the level below, we also specify that the node from the lower level should replace the node at the upper level. It is important to note that the Promote actions will chain so that a single node can be promoted multiple times. For example, the NUMBER node produced will be promoted in the rule

exp_atom -> NUMBER^;

Then as it replaces its parent, it can also be promoted by the rule hereafter and so forth

exp_factor -> exp_atom^;

Application

With the modifications prescribed above, the final grammar with all the tree actions should look like this:

grammar MathExp
{
    options
    {
        Axiom = "exp";
        Separator = "SEPARATOR";
    }
    terminals
    {
        WHITE_SPACE -> U+0020 | U+0009 | U+000B | U+000C ;
        SEPARATOR   -> WHITE_SPACE+;

        INTEGER     -> [1-9] [0-9]* | '0' ;
        REAL        -> INTEGER? '.' INTEGER  (('e' | 'E') ('+' | '-')? INTEGER)?
                    |  INTEGER ('e' | 'E') ('+' | '-')? INTEGER ;
        NUMBER      -> INTEGER | REAL ;
    }
    rules
    {
        exp_atom   -> NUMBER^
                   | '('! exp^ ')'! ;
        exp_factor -> exp_atom^
                   |  exp_factor '*'^ exp_atom
                   |  exp_factor '/'^ exp_atom ;
        exp_term   -> exp_factor^
                   |  exp_term '+'^ exp_factor
                   |  exp_term '-'^ exp_factor ;
        exp        -> exp_term^ ;
    }
}

Now let's test the result.
Regenerate the lexer and parser for the new grammar:

  • On Windows: himecc.bat MathExp.gram -t:java or explicitly with net461/himecc.exe MathExp.gram -t:java
  • On Linux and MacOS: ./himecc MathExp.gram -t:java
  • On any OS, explicitly with .Net Core: dotnet netcore20/himecc.dll MathExp.gram -t:java
  • On any OS, explicitly with Mono: mono net461/himecc.exe MathExp.gram -t:java

Replace the old files in your test project by the newly generated ones.
Then rebuild the test project and run.

mvn clean verify
and then
java -jar target/test-hime-1.0.0-jar-with-dependencies.jar

The output of the program should be the simplified AST, the same as shown at the beginning of this tutorial:

* = *
+-> + = +
|   +-> NUMBER = 2
|   +-> NUMBER = 3
+-> NUMBER = 4

Conclusion

In this tutorial we managed to simplify the AST produced by the parser so that it is closer to the intended semantics and therefore easier to use and less cluttered. To achieve this we used simple tree actions; the Drop action to remove sub-trees and the Promote action to modify the tree's hierarchy.

Now that we have a nice AST, it would be nice to use it for the evaluation for the parsed arithmetic expression. Go to the next tutorial to see how to use semantic actions for this purpose.