Hime - Tutorial 2 (C#) - Tree actions

Table of content

Home
- Downloads
Get started
- Kickstart in: C#, Java, Rust
- C# tutorials: Basics, Tree actions, Semantic actions
- Java tutorials: Basics, Tree actions, Semantic actions
- Rust tutorials: Basics, Tree actions, Semantic actions
Grammars and edition
- Editors for Hime grammars
- Library of Hime grammars
Reference
- Command Line: himecc
- Grammar Language
  - Grammar Inheritance
  - Grammar Options
  - Lexical Rules
  - Syntactic Rules
- Bibliography
API Documentation
- API Documentation v3.5.0 .Net, Java, Rust
- API Documentation v3.4.0 .Net, Java, Rust
- API Documentation v3.3.2 .Net, Java, Rust
- API Documentation v3.3.1 .Net, Java, Rust
- API Documentation v3.3.0 .Net, Java, Rust
- API Documentation v3.2.0 .Net, Java
- API Documentation v3.1.0 .Net, Java
- API Documentation v3.0.0 .Net, Java
- API Documentation v2.0.6 .Net, Java
- API Documentation v2.0.5 .Net, Java
- API Documentation v2.0.1 .Net, Java
- API Documentation v1.3.2 .Net, Java
- API Documentation v1.2.0 .Net, Java
- API Documentation v1.1.0 .Net, Java
- API Documentation v1.0.0 .Net, Java
Release Notes
- v3.5.1, 2020, August 6th.
- v3.5.0, 2020, May 11th.
- v3.4.1, 2019, January 10th.
- v3.4.0, 2018, August 9th.
- v3.3.2, 2018, May 18th.
- v3.3.1, 2018, February 18th.
- v3.3.0, 2018, January 24th.
- v3.2.2, 2017, Octobre 19th.
- v3.2.1, 2017, Octobre 15th.
- v3.2.0, 2017, Octobre 4th.
- v3.1.0, 2017, September 26th.
- v3.0.1, 2017, August 3rd.
- v3.0.0, 2017, May 4th.
- v2.0.6, 2017, February 7th.
- v2.0.5, 2016, September 10th.
- v2.0.4, 2016, March 29th.
- v2.0.3, 2016, March 1st.
- v2.0.2, 2016, January 20th.
- v2.0.1, 2015, October 25th.
- v1.3.2, 2015, January 22nd.
- v1.3.1, 2014, October 23rd.
- v1.3.0, 2014, September 16th.
- v1.2.0, 2014, August 14th.
- v1.1.0, 2014, May 29th.
- v1.0.0, 2014, May 12th.

Tutorial 2 (C#) - Tree actions

In the previous tutorial, we wrote a grammar for simple mathematic expressions, compiled it with the himecc tool to generate the corresponding parser and used the result to parse an expression. The produced Abstract Syntax Tree (AST) corresponded exactly to the grammar, but had some shortcomings. It contained numerous nodes for the variables that may hinder the further analysis of the AST. Let’s take a slightly more complex example. At this point, parsing the (2+3)*4 expression would produce the following AST:

exp
+-> exp_term
    +-> exp_factor
        +-> exp_factor
        |   +-> exp_atom
        |       +-> ( = (
        |       +-> exp
        |       |   +-> exp_term
        |       |       +-> exp_term
        |       |       |   +-> exp_factor
        |       |       |       +-> exp_atom
        |       |       |           +-> NUMBER = 2
        |       |       +-> + = +
        |       |       +-> exp_factor
        |       |           +-> exp_atom
        |       |               +-> NUMBER = 3
        |       +-> ) = )
        +-> * = *
        +-> exp_atom
            +-> NUMBER = 4

This AST contains all the nodes that correspond to the variables in the grammar, e.g. exp_factor, etc. However, it is hard to see the structure of the mathematical expression in this AST. What we really want to have an AST that only contains the intended semantic of the mathematical expression. It would be much simpler to have an AST with a shape much closer to the conceptual arithmetic expression, such as:

* = *
+-> + = +
|   +-> NUMBER = 2
|   +-> NUMBER = 3
+-> NUMBER = 4

In this tutorial, we will extend our previous specification of mathematic expression to include tree actions, i.e. actions that will be applied to the produced AST in order to produce a more meaningful one and prune the unnecessary details in order to produce to the latter AST.

The Drop action

The first and simplest tree action is the Drop action. It specifies that the node marked with this action, and all its children should be completely removed from the AST. For example, let’s say we do not want to see the tokens corresponding to the brackets in the final AST. What we have to do is to mark the corresponding two nodes with Drop actions. To do so, we append a special operator in the grammar rules. The exp_atom rules then become as follow:

exp_atom -> NUMBER
         | '('! exp ')'! ;

The terminals for the brackets are followed by the! character, which means that the AST node corresponding to them will be applied theDrop action.

The Promote action

The Promote action is used to mark a node in the AST that should replace its parent. The node itself is removed from its collection of children and takes the place of its parent. If the node to be promoted has children, then they are put in the original place of the marked node. The Promote action is specified in the grammar rules using the ^ symbol. It is placed behind the element that will correspond to the AST node that should be promoted. For example, we further amend theexp_atom rule as follow:

exp_atom -> NUMBER^
         | '('! exp^ ')'! ;

This rule has two alternatives, separated with the | character. Now, we applied the Promote action using the ^ operator. In the first alternative we specify that the node corresponding to the NUMBER token will replace its parent, i.e. the node corresponding to the exp_atom variable. Furthermore, in the second alternative, we specify that not only should the brackets be discarded, but that the node corresponding to the exp element should replace the parent. We modify the rest of the rules in the same fashion as follow:

exp_factor -> exp_atom^
           |  exp_factor '*'^ exp_atom
           |  exp_factor '/'^ exp_atom ;
exp_term   -> exp_factor^
           |  exp_term '+'^ exp_factor
           |  exp_term '-'^ exp_factor ;
exp        -> exp_term^ ;

In this example, all the nodes for the mathematical operators have to be promoted to replace their respective parents. Furthermore, for the alternatives where the expression at one lever defers to the rule as the level below, we also specify that the node from the lower level should replace the node at the upper level. It is important to note that the Promote actions will chain so that a single node can be promoted multiple times. For example, the NUMBER node produced will be promoted in the rule

exp_atom -> NUMBER^;

Then as it replaces its parent, it can also be promoted by the rule hereafter and so forth

exp_factor -> exp_atom^;

Application

With the modifications prescribed above, the final grammar with all the tree actions should look like this:

grammar MathExp
{
    options
    {
        Axiom = "exp";
        Separator = "SEPARATOR";
    }
    terminals
    {
        WHITE_SPACE -> U+0020 | U+0009 | U+000B | U+000C ;
        SEPARATOR   -> WHITE_SPACE+;

        INTEGER     -> [1-9] [0-9]* | '0' ;
        REAL        -> INTEGER? '.' INTEGER  (('e' | 'E') ('+' | '-')? INTEGER)?
                    |  INTEGER ('e' | 'E') ('+' | '-')? INTEGER ;
        NUMBER      -> INTEGER | REAL ;
    }
    rules
    {
        exp_atom   -> NUMBER^
                   | '('! exp^ ')'! ;
        exp_factor -> exp_atom^
                   |  exp_factor '*'^ exp_atom
                   |  exp_factor '/'^ exp_atom ;
        exp_term   -> exp_factor^
                   |  exp_term '+'^ exp_factor
                   |  exp_term '-'^ exp_factor ;
        exp        -> exp_term^ ;
    }
}

Now let's test the result.
Regenerate the lexer and parser for the new grammar:

On Windows:himecc.bat MathExp.gram or explicitly withnet461/himecc.exe MathExp.gram
On Linux and MacOS:./himecc MathExp.gram
On any OS, explicitly with .Net Core:dotnet netcore20/himecc.dll MathExp.gram
On any OS, explicitly with Mono:mono net461/himecc.exe MathExp.gram

Replace the old files in your test project by the newly generated ones.
Then rebuild the test project and run.

Using the .Net Framework or Mono:msbuild
Then:bin/Debug/net461/TestHime.exe, or with Monomono bin/Debug/net461/TestHime.exe
Using .Net Core:dotnet build
Then:dotnet bin/Debug/netcoreapp2.0/TestHime.dll

The output of the program should be the simplified AST, the same as shown at the beginning of this tutorial:

* = *
+-> + = +
|   +-> NUMBER = 2
|   +-> NUMBER = 3
+-> NUMBER = 4

Conclusion

In this tutorial we managed to simplify the AST produced by the parser so that it is closer to the intended semantics and therefore easier to use and less cluttered. To achieve this we used simple tree actions; the Drop action to remove sub-trees and the Promote action to modify the tree's hierarchy.

Now that we have a nice AST, it would be nice to use it for the evaluation for the parsed arithmetic expression. Go to the next tutorialto see how to use semantic actions for this purpose.