Lexical Analyser In Java

Tags:

java

lexer

I have been trying to write a simple lexical analyzer in java .

The File Token.java looks as follows :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public enum Token {

    TK_MINUS ("-"), 
    TK_PLUS ("\\+"), 
    TK_MUL ("\\*"), 
    TK_DIV ("/"), 
    TK_NOT ("~"), 
    TK_AND ("&"),  
    TK_OR ("\\|"),  
    TK_LESS ("<"),
    TK_LEG ("<="),
    TK_GT (">"),
    TK_GEQ (">="), 
    TK_EQ ("=="),
    TK_ASSIGN ("="),
    TK_OPEN ("\\("),
    TK_CLOSE ("\\)"), 
    TK_SEMI (";"), 
    TK_COMMA (","), 
    TK_KEY_DEFINE ("define"), 
    TK_KEY_AS ("as"),
    TK_KEY_IS ("is"),
    TK_KEY_IF ("if"), 
    TK_KEY_THEN ("then"), 
    TK_KEY_ELSE ("else"), 
    TK_KEY_ENDIF ("endif"),
    OPEN_BRACKET ("\\{"),
    CLOSE_BRACKET ("\\}"),
    DIFFERENT ("<>"),

    STRING ("\"[^\"]+\""),
    INTEGER ("\\d"), 
    IDENTIFIER ("\\w+");

    private final Pattern pattern;

    Token(String regex) {
        pattern = Pattern.compile("^" + regex);
    }

    int endOfMatch(String s) {
        Matcher m = pattern.matcher(s);

        if (m.find()) {
            return m.end();
        }
        return -1;
    }
}

The Lexer is as follows : Lexer.java

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.Stream;

public class Lexer {
    private StringBuilder input = new StringBuilder();
    private Token token;
    private String lexema;
    private boolean exausthed = false;
    private String errorMessage = "";
    private Set<Character> blankChars = new HashSet<Character>();

    public Lexer(String filePath) {
        try (Stream<String> st = Files.lines(Paths.get(filePath))) {
            st.forEach(input::append);
        } catch (IOException ex) {
            exausthed = true;
            errorMessage = "Could not read file: " + filePath;
            return;
        }

        blankChars.add('\r');
        blankChars.add('\n');
        blankChars.add((char) 8);
        blankChars.add((char) 9);
        blankChars.add((char) 11);
        blankChars.add((char) 12);
        blankChars.add((char) 32);

        moveAhead();
    }

    public void moveAhead() {
        if (exausthed) {
            return;
        }

        if (input.length() == 0) {
            exausthed = true;
            return;
        }

        ignoreWhiteSpaces();

        if (findNextToken()) {
            return;
        }

        exausthed = true;

        if (input.length() > 0) {
            errorMessage = "Unexpected symbol: '" + input.charAt(0) + "'";
        }
    }

    private void ignoreWhiteSpaces() {
        int charsToDelete = 0;

        while (blankChars.contains(input.charAt(charsToDelete))) {
            charsToDelete++;
        }

        if (charsToDelete > 0) {
            input.delete(0, charsToDelete);
        }
    }

    private boolean findNextToken() {
        for (Token t : Token.values()) {
            int end = t.endOfMatch(input.toString());

            if (end != -1) {
                token = t;
                lexema = input.substring(0, end);
                input.delete(0, end);
                return true;
            }
        }

        return false;
    }

    public Token currentToken() {
        return token;
    }

    public String currentLexema() {
        return lexema;
    }

    public boolean isSuccessful() {
        return errorMessage.isEmpty();
    }

    public String errorMessage() {
        return errorMessage;
    }

    public boolean isExausthed() {
        return exausthed;
    }
}

And can be tested with a Try.java as follows :

public class Try {

    public static void main(String[] args) {

        Lexer lexer = new Lexer("C:/Users/Input.txt");

        System.out.println("Lexical Analysis");
        System.out.println("-----------------");
        while (!lexer.isExausthed()) {
            System.out.printf("%-18s :  %s \n",lexer.currentLexema() , lexer.currentToken());
            lexer.moveAhead();
        }

        if (lexer.isSuccessful()) {
            System.out.println("Ok! :D");
        } else {
            System.out.println(lexer.errorMessage());
        }
    }
}

Say the Input.txt has

define mine 
a=1000;
b=23.5;

The output I expect is

define : TK_KEYWORD
mine : IDENTIFIER
a : IDENTIFIER
= : TK_ASSIGN
1000 : INTEGER
; : TK_SEMI
b : IDENTIFIER
= : TK_ASSIGN
23.5 : REAL

But The issue I am facing is : It treats each digit like

1 INTEGER
0 INTEGER
0 INTEGER
0 INTEGER

also it doesn't recognize Real numbers . I get:

Unexpected symbol: '.'

What are the changes needed to get the expected results?

634

asked Mar 28 '17 11:03

Vicky

1 Answers

Your pattern to match integer is:

INTEGER ("\\d"),

That matches exactly one digit.

If you want more than one, go for

INTEGER ("\\d+"),

for example.

And, just for completion, the missing other pattern for floating point numbers could look like

REAL ("(\\d+)\\.\\d+")

as the comments pointed out. Or

REAL ("(\\d*)\\.\\d+")

to allow for

.23

too - if that is what you are looking for!

192

answered Oct 29 '22 09:10

GhostCat

Related questions
                            
                                Android POI : crash when using autoSizeColumn()
                            
                                Static const in Kotlin from Java class name
                            
                                How to connect with PostgreSQL database over SSL?
                            
                                JVM crashed in java.util.zip.ZipFile.getEntry
                            
                                How to handle browser level notification using Selenium Webdriver
                            
                                DatatypeConverter vs Base64
                            
                                org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
                            
                                Hibernate hql, execute multiple update statements in same query
                            
                                How to use Flowable in RxJava 2?
                            
                                Java - Timer is not being removed after execution
                            
                                Failed to convert value of type java.lang.String to required type java.util.Date
                            
                                Handling an update a ManyToOne entity collection via REST API
                            
                                removing unused assets from 3rd party library
                            
                                how to evenly distribute elements of a JavaFX VBox
                            
                                Java httpServer basic authentication for different request methods
                            
                                Why does wildcard for jar execution not work in docker CMD?
                            
                                Java 8 - Remove repeated sequence of elements from a List
                            
                                logging.level.root does not work (spring Boot)
                            
                                Why is Java Stream generator unordered?
                            
                                compile time: no instance(s) of type variable(s) U exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With