Get started
Grammars and edition
API Documentation
Release Notes
This tutorial explains through a simple example how to use the parser generator in Java. It is also available for C# and Rust. It assumes you are moderately familiar with regular expressions, and context-free grammars. The program here is to download the tooling, get a prepared grammar, compile it with himecc and use the generated parser. This tutorial will use the example of a small language for simple mathematical expressions. Using this language, one would be able to use the standards +, -, * and / operators, as well as ( ).
The toolchain for Hime can be found on the download page.
Download and extract the toolchain package.
You may want some editing support for Hime grammars. In this case, head to the editors line-up page. For an IDE experience, we recommend theVisual Studio Code extension.
Hime provides its own language for expressing context-free grammars. It is largely similar to the standard BNF form with some enhancements for the sake of expressivity. In the example below, the grammar is shown to have three sections:
grammar MathExp
{
options { }
terminals { }
rules { }
}
options
section lets you specify options for the lexer and parser generator.terminals
section is used to describe the regular expressions matching the terminals in the grammar. This section is optional.rules
section is used for stating the context-free rules of the grammar.In our example, we want to use numbers so we have to define regular expressions matching these:
terminals
{
INTEGER -> [1-9] [0-9]* | '0' ;
REAL -> INTEGER? '.' INTEGER (('e' | 'E') ('+' | '-')? INTEGER)?
| INTEGER ('e' | 'E') ('+' | '-')? INTEGER ;
NUMBER -> INTEGER | REAL ;
}
Here, we define three terminals:
INTEGER
, which is defined using a very simple regular expressionREAL
, which matches decimal numbers (23.15
) and floats using a power expression (12e-15
)NUMBER
, which is the union of the two formerBecause NUMBER
will match any INTEGER
and REAL
, we have to decide which of these is to be recognized by the lexer. The simple rule here is that the latest defined terminals have higher priority. Hence, the generated lexer will only be able to recognize NUMBER
and will never yield INTEGER
or REAL
. Also, we want to define a regular expression matching blank spaces that will be used as separators:
WHITE_SPACE -> U+0020 | U+0009 | U+000B | U+000C ;
SEPARATOR -> WHITE_SPACE+;
Here, the WHITE_SPACE
expression will match a single space, or tab, or vertical tab. The SEPARATOR
expression will match any string of at least one WHITE_SPACE
and has a higher priority than WHITE_SPACE
.
Now, we can use these terminals to write the context-free grammar rules:
rules
{
exp_atom -> NUMBER
| '('exp ')' ;
exp_factor -> exp_atom
| exp_factor '*' exp_atom
| exp_factor '/' exp_atom ;
exp_term -> exp_factor
| exp_term '+' exp_factor
| exp_term '-' exp_factor ;
exp -> exp_term ;
}
Here, we have 4 grammar rules, using 4 grammar variables and the NUMBER
terminal. Note that it is possible to add inline terminals in the grammar rules (e.g.: '+'
). These terminals will have higher priority over those defined in the terminals
section, although their definition is limited to simple text (no regular expressions here).
variable -> definition ;
You can also use the the *
, +
, and ?
operators as in regular expressions. The parser generator will automatically generates the additional corresponding rules. Use ( )
for grouping terms in rules and |
for alternatives.
Finally, we setup the grammar options as follow:
options
{
Axiom = "exp";
Separator = "SEPARATOR";
}
Axiom
option specifies which grammar variable is to be used as the root symbol for the grammar.Separator
option specifies which terminal is to be used as token separator in the lexer. Text matching the expression of the separator will automatically be discarded by the lexer.Now that we have a grammar, let's compile it to generate the parser. The toolchain package contains at its root useful front scripts that can be to invoke the himecc
compiler. On Windows, you should look for the himecc.bat
script. On Linux and MacOS you should look for the himecc
script. If this does not suit you, you may invoke the himecc.exe
assembly for your installed framework.
Compile the MathExp.gram
:
himecc.bat MathExp.gram -t:java
or explicitly withnet461/himecc.exe MathExp.gram -t:java
./himecc MathExp.gram -t:java
dotnet netcore20/himecc.dll MathExp.gram -t:java
mono net461/himecc.exe MathExp.gram -t:java
The tool will generate 4 files:
MathExpLexer.java
, the source file for the lexerMathExpLexer.bin
, the binary representation of the lexer’s automatonMathExpParser.java
, the source file for the parserMathExpParser.bin
, the binary representation of the parser’s automatonNote here that the default target for himecc
is the .Net platform; so that we have to specify Java as the target with the -t:java
option. For a complete guide to the options of himecc
, head to the reference page.
Setup a test project as a standard Maven project. Use the following project layout:
test/ +-> pom.xml +-> src/ +-> main/ +-> java/ | +-> math_exp/ | +-> Program.java | +-> MathExpLexer.java | +-> MathExpParser.java +-> resources/ +-> math_exp +-> MathExpLexer.bin +-> MathExpParser.bin
Set the minimal pom.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>fr.cenotelie.hime</groupId>
<artifactId>test-hime</artifactId>
<packaging>jar</packaging>
<version>1.0.0</version>
<dependencies>
<!-- reference to the Java runtime for generated parsers -->
<dependency>
<groupId>fr.cenotelie.hime</groupId>
<artifactId>hime-redist</artifactId>
<version>3.5.1</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
<configuration>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<archive>
<manifest>
<!-- register the main className for the program -->
<mainClass>math_exp.Program</mainClass>
</manifest>
</archive>
<descriptorRefs>
<!-- produce a jar with all dependencies (Hime runtime is embedded) -->
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Set the minimal Program.java
:
package math_exp; // default namespace for the parser is the grammar's name
import java.util.Arrays;
import fr.cenotelie.hime.redist.ASTNode;
import fr.cenotelie.hime.redist.ParseResult;
public className Program {
public static void main(String[] args) {
MathExpLexer lexer = new MathExpLexer("2 + 3");
MathExpParser parser = new MathExpParser(lexer);
ParseResult result = parser.parse();
ASTNode root = result.getRoot();
print(root, new boolean[]{});
}
private static void print(ASTNode node, boolean[] crossings) {
for (int i = 0; i < crossings.length - 1; i++)
System.out.print(crossings[i] ? "| " : " ");
if (crossings.length > 0)
System.out.print("+-> ");
System.out.println(node.toString());
for (int i = 0; i != node.getChildren().size(); i++) {
boolean[] childCrossings = Arrays.copyOf(crossings, crossings.length + 1);
childCrossings[childCrossings.length - 1] = (i < node.getChildren().size() - 1);
print(node.getChildren().get(i), childCrossings);
}
}
}
The parse tree, or AST (Abstract Syntax Tree) produced by the parser is available as a handle for the tree's root in the code above. Nodes in the tree have two important properties:
Symbol
contains the grammar's symbol attached to the node, a variable or a token, i.e. a piece of text matched by the lexer corresponding to one of the grammar's terminals.Children
is the list of the current node's children.For more info, look into the complete API documentation.
Build the test project:
mvn clean verify
Execute the test project:
java -jar target/test-hime-1.0.0-jar-with-dependencies.jar
The output of the program should a text printout of the produced syntax tree be as follow:
exp +-> exp_term +-> exp_term | +-> exp_factor | +-> exp_atom | +-> NUMBER = 2 +-> + = + +-> exp_factor +-> exp_atom +-> NUMBER = 3
The lexers and parser created in this manner are bound to parse the piece of text given to the lexer. In order to parse another piece of text, simply create a new lexer and parser in the same way. The two objects are lightweight so you should not have to worry about creating multiple ones. In this example, the lexer is given the full string that has to be parsed. Another constructor allows you to give a stream of text in the form of a InputStreamReader
.
Generated parsers also provides an API to easily visit the produced AST. First, one has to declare a new class that extends the provided Visitor
.
// Visitor class
class MyVisitor extends MathExpParser.Visitor {
@Override
public void onTerminalNumber(ASTNode node) {
System.out.println("Found value: " + node.getValue());
}
}
In this example, we only override the onTerminalNumber
method. But equivalent methods are also available to react to other terminals and variables. Finally, to visit the full AST, we call:
// Creates the lexer and parser
MathExpLexer lexer = new MathExpLexer("2 + 3");
MathExpParser parser = new MathExpParser(lexer);
ParseResult result = parser.parse();
MathExpParser.visit(result, new MyVisitor());
Note that the MathExpParser.visitASTNode
function is also available to start visiting AST at a specific node instead of the root.
Go to the next tutorial to see how to improved the produced parse trees by using tree actions.