Using Antlr to parse date ranges in Java and Kotlin

I wanted to parse date ranges that could occur as e.g. “01.01.” or “01.01.-05.01.” or “01.01.-05.01./09.01.” or similar combinations. To make it easier to correctly parse all possible combinations I have used Antlr to parse the dates. First I had to create rules in a file that I named “Dates.g4” that define what is a … Continue reading “Using Antlr to parse date ranges in Java and Kotlin”

I wanted to parse date ranges that could occur as e.g. “01.01.” or “01.01.-05.01.” or “01.01.-05.01./09.01.” or similar combinations. To make it easier to correctly parse all possible combinations I have used Antlr to parse the dates.

First I had to create rules in a file that I named “Dates.g4” that define what is a valid date range:

grammar Dates;
r: (element (divider? element)*);
element: (daterange | singledate);
daterange: date minus date;
singledate: date;
minus: '-' | '–';
divider: '/';
date: day '.' month ('.')?;
day: INT;
month: INT;
INT: [0-9]+;
WS: [ \t\r\n]+ -> skip ;

Let’s see what this does. The “grammar” line just defines a name. The next line defines a token “r” that can consist of an “element” and an arbitrary number of “divider” objects (or no divider) and another element. The next line defines what such an “element” is. It is either a “daterange” or a “singledate”. And so on, all tokens are defined this way. A question mark makes the element optional, i.e. it does not need to be in the parsed text.

The rules in uppercase letters are lexer rules, i.e. they don’t use self defined tokens to define the structure of the parsed text but they define characters that should be allowed.

I have used the Intellij IDEA IDE with the Antlr plugin. So to generate the necessary Java classes from the *.g4 file above I just had to right click the *.g4 file and choose “Generate ANTLR Recognizer”:

This creates several classes in the directory and package that you can change by clicking on “Configure ANTLR” in the menu above.

The generated classes are easy to use, e.g. in Kotlin:

val lexer = DatesLexer(CharStreams.fromString(text))
val parser = DatesParser(CommonTokenStream(lexer))

val parsed = parser.r()
for (element in parsed.element()) {

Inside the loop you can now access the dates e.g. with

element.daterange()

and

element.singledate()

because as defined in the *.g4 file above an element contains either a “daterange” or a “singledate”. As you can see the generated functions use the names that were specified in the *.g4 file.