Tutorial 3 - Using Semantic Actions

The previous tutorials demonstrated how to write a grammar, compile it and use the generated lexer and parser. It also demonstrated how to use tree actions in order to improve the produced Abstract Syntax Tree (AST). In this tutorial, we go a step further. In our previous example, we wrote a grammar for simple mathematical expression. But what if we want to describe how the mathematic expressions have to be interpreted? In this tutorial, we see how we can define semantic actions executed during the parsing. Building upon our example, we will use semantic actions to define a simple interpreter of arithmetic expressions along our parser.

Here we start from the grammar produced a the end of the previous tutorial:

grammar MathExp
{
    options
    {
        Axiom = "exp";
        Separator = "SEPARATOR";
    }
    terminals
    {
        WHITE_SPACE -> U+0020 | U+0009 | U+000B | U+000C ;
        SEPARATOR   -> WHITE_SPACE+;

        INTEGER     -> [1-9] [0-9]* | '0' ;
        REAL        -> INTEGER? '.' INTEGER  (('e' | 'E') ('+' | '-')? INTEGER)?
                    |  INTEGER ('e' | 'E') ('+' | '-')? INTEGER ;
        NUMBER      -> INTEGER | REAL ;
    }
    rules
    {
        exp_atom   -> NUMBER^
                   | '('! exp^ ')'! ;
        exp_factor -> exp_atom^
                   |  exp_factor '*'^ exp_atom
                   |  exp_factor '/'^ exp_atom ;
        exp_term   -> exp_factor^
                   |  exp_term '+'^ exp_factor
                   |  exp_term '-'^ exp_factor ;
        exp        -> exp_term^ ;
    }
}

Semantic actions

First, semantic actions are pieces of behaviors that are introduced in grammar rules and executed when the parser encountered them. In Hime, we strongly believe in the separation of concerns and we do not write the content of the semantic actions within the grammar rules. Semantic actions are specified in the grammar rules, but must be implemented with external code. For example, we now amend the exp_atom rule definition to introduce a semantic action as follow:

exp_atom-> NUMBER^ @OnNumber
        | '('! exp^ ')'! ;

Here, the @OnNumber represents the semantic action. It is placed directly after the NUMBER element in the rule. The new meaning of this rule is that when the parser finds itself in this rule and immediately after it encounters a NUMBER token, it will execute the @OnNumber semantic action.

The @OnNumber semantic action is implemented by a piece of code that you specify and that will be called by the parser in this situation. In a sense, they are callbacks that you define and that are given to the parser. In Java, a semantic action is implemented as a method with the following signature:

public void onNumber(Symbol head, SemanticBody body);

There should be no return value. The semantic action is given two parameters:

  • head is the symbol corresponding to the head of the rule in which the parser is in. In our example, this would be the exp_atom symbol.
  • body is a handle giving access to the symbols found by the parser for the rule so far. In our example, there will be only one, a NUMBER token.

Note that if the semantic action is placed in the middle of the rule, the parser will only give in the body parameter the symbols that are before the semantic action. This is due to the fact that the parser is precisely at this point in the rule.

Updating the grammar

What we want to do in the case of our example is to implement a stack-based interpreter that will interpret on the fly the mathematical expression that is being parsed. To do so, the first step is to modify the grammar rules in order to introduce semantic actions that tell us what the parser is currently doing. We then amend the grammar rules as follow:

exp_atom  -> NUMBER^ @OnNumber
        | '('! exp^ ')'! ;
exp_factor  -> exp_atom^
        |  exp_factor '*'^ exp_atom @OnMult
        |  exp_factor '/'^ exp_atom @OnDiv;
exp_term  -> exp_factor^
        |  exp_term '+'^ exp_factor @OnPlus
        |  exp_term '-'^ exp_factor @OnMinus;

With the above semantic actions, the parser will tell us when it encountered a NUMBER token, or any of the operators. Our strategy is to rely on the fact that the above rules also implement the mathematic operators’ precedence. Hence, when we reach the @OnNumber semantic action we only have to push the value of the NUMBER token found on a stack and when we reach an operator we pop two values, execute the semantic of the operator and push the result back on the stack.

Now let's regenerate the lexer and parser for the new grammar:

  • On Windows: himecc.bat MathExp.gram -t:java or explicitly with net461/himecc.exe MathExp.gram -t:java
  • On Linux and MacOS: ./himecc MathExp.gram -t:java
  • On any OS, explicitly with .Net Core: dotnet netcore20/himecc.dll MathExp.gram -t:java
  • On any OS, explicitly with Mono: mono net461/himecc.exe MathExp.gram -t:java

Replace the old files in your test project by the newly generated ones.

Implement semantic actions

Now we have to implement the semantic actions in Java. Looking at the content of the MathExpParser.cs source for the parser, you can see the definition has been updated to include an innner Actions class with methods corresponding to our defined semantic actions:

public static class Actions {
    // The OnNumber semantic action
    public void onNumber(Symbol head, SemanticBody body) { }
    // The OnMult semantic action
    public void onMult(Symbol head, SemanticBody body) { }
    // The OnDiv semantic action
    public void onDiv(Symbol head, SemanticBody body) { }
    // The OnPlus semantic action
    public void onPlus(Symbol head, SemanticBody body) { }
    // The OnMinus semantic action
    public void onMinus(Symbol head, SemanticBody body) { }
}

In the same file we can see that the MathExpParser class can now be instantiated with an instance of this Actions class. This means that the methods in this passed instance will be called when semantic actions are executed.

public MathExpParser(MathExpLexer lexer, Actions actions)

What we want to do is extend the Actions class and give an instance of it to the parser. We will then define an Evaluator class that will extends Actions and implement a stack-based evaluator of arithmetic expression:

package mathexp;

import fr.cenotelie.hime.redist.SemanticBody;
import fr.cenotelie.hime.redist.Symbol;

import java.util.Stack;

public class Evaluator extends MathExpParser.Actions {

    private final Stack<Float> stack = new Stack<>();

    public float getResult() {
        return stack.peek();
    }

    // we override the base semantic actions (that do nothing)
    public void onNumber(Symbol head, SemanticBody body) {
        stack.push(Float.parseFloat(body.at(0).getValue()));
    }

    public void onMult(Symbol head, SemanticBody body) {
        float right = stack.pop();
        float left = stack.pop();
        stack.push(left * right);
    }

    public void onDiv(Symbol head, SemanticBody body) {
        float right = stack.pop();
        float left = stack.pop();
        stack.push(left / right);
    }

    public void onPlus(Symbol head, SemanticBody body) {
        float right = stack.pop();
        float left = stack.pop();
        stack.push(left + right);
    }

    public void onMinus(Symbol head, SemanticBody body) {
        float right = stack.pop();
        float left = stack.pop();
        stack.push(left - right);
    }
}

Now all we need is to modify the program to use the evaluator.

public static void main(String[] args) {
    Evaluator evaluator = new Evaluator();
    MathExpLexer lexer = new MathExpLexer("(2 + 3) * 4");
    MathExpParser parser = new MathExpParser(lexer, evaluator);
    ParseResult result = parser.parse();
    System.out.println(evaluator.getResult());
}

Test the evaluator

Then rebuild the test project and run.

mvn clean verify
and then
java -jar target/test-hime-1.0.0-jar-with-dependencies.jar

The output of the program should be 20.0.

This conclude this series of tutorial for how to use Hime-generated parsers in Java. For more information about Hime, head to the reference documentation (see the Table of Content on the left).