Making a lexical Analyzer

Tags:

I'm working with a Lexical Analyzer program right now and I'm using Java. I've been researching for answers on this problem but until now I failed to find any. Here's my problem:

Input:

System.out.println ("Hello World");

Desired Output:

Lexeme----------------------Token  System [Key_Word]  .       [Object_Accessor]  out   [Key_Word]  . [Object_Accessor]  println  [Key_Word]  (  [left_Parenthesis]  "Hello World"    [String_Literal]  )   [right_Parenthesis]  ;  [statement_separator]

I'm still a beginner so I hope you guys can help me on this. Thanks.

813

asked Jul 25 '13 02:07

KLoverated

1 Answers

You need neither ANTLR nor the Dragon book to write a simple lexical analyzer by hand. Even lexical analyzers for fuller languages (like Java) aren't terribly complicated to write by hand. Obviously if you have an industrial task you might want to consider industrial strength tools like ANTLR or some lex variant, but for the sake of learning how lexical analysis works, writing one by hand would likely prove to be a useful exercise. I'm assuming that this is the case, since you said you're still a beginner.

Here's a simple lexical analyzer, written in Java, for a subset of a Scheme-like language, that I wrote after seeing this question. I think the code is relatively easy to understand even if you've never seen a lexer before, simply because breaking a stream of characters (in this case a String) into a stream of tokens (in this case a List<Token>) isn't that hard. If you have questions I can try to explain in more depth.

import java.util.List; import java.util.ArrayList;  /*  * Lexical analyzer for Scheme-like minilanguage:  * (define (foo x) (bar (baz x)))  */ public class Lexer {     public static enum Type {         // This Scheme-like language has three token types:         // open parens, close parens, and an "atom" type         LPAREN, RPAREN, ATOM;     }     public static class Token {         public final Type t;         public final String c; // contents mainly for atom tokens         // could have column and line number fields too, for reporting errors later         public Token(Type t, String c) {             this.t = t;             this.c = c;         }         public String toString() {             if(t == Type.ATOM) {                 return "ATOM<" + c + ">";             }             return t.toString();         }     }      /*      * Given a String, and an index, get the atom starting at that index      */     public static String getAtom(String s, int i) {         int j = i;         for( ; j < s.length(); ) {             if(Character.isLetter(s.charAt(j))) {                 j++;             } else {                 return s.substring(i, j);             }         }         return s.substring(i, j);     }      public static List<Token> lex(String input) {         List<Token> result = new ArrayList<Token>();         for(int i = 0; i < input.length(); ) {             switch(input.charAt(i)) {             case '(':                 result.add(new Token(Type.LPAREN, "("));                 i++;                 break;             case ')':                 result.add(new Token(Type.RPAREN, ")"));                 i++;                 break;             default:                 if(Character.isWhitespace(input.charAt(i))) {                     i++;                 } else {                     String atom = getAtom(input, i);                     i += atom.length();                     result.add(new Token(Type.ATOM, atom));                 }                 break;             }         }         return result;     }      public static void main(String[] args) {         if(args.length < 1) {             System.out.println("Usage: java Lexer \"((some Scheme) (code to) lex)\".");             return;         }         List<Token> tokens = lex(args[0]);         for(Token t : tokens) {             System.out.println(t);         }     } }

Example use:

~/code/scratch $ java Lexer "" ~/code/scratch $ java Lexer "(" LPAREN ~/code/scratch $ java Lexer "()" LPAREN RPAREN ~/code/scratch $ java Lexer "(foo)" LPAREN ATOM<foo> RPAREN ~/code/scratch $ java Lexer "(foo bar)" LPAREN ATOM<foo> ATOM<bar> RPAREN ~/code/scratch $ java Lexer "(foo (bar))" LPAREN ATOM<foo> LPAREN ATOM<bar> RPAREN RPAREN

Once you've written one or two simple lexers like this, you will get a pretty good idea of how this problem decomposes. Then it would be interesting to explore how to use automated tools like lex. The theory behind regular expression based matchers is not too difficult, but it does take a while to fully understand. I think writing lexers by hand motivates that study and helps you come to grips with the problem better than diving into the theory behind converting regular expressions to finite automate (first NFAs, then NFAs to DFAs), etc... wading into that theory can be a lot to take in at once, and it is easy to get overwhelmed.

Personally, while the Dragon book is good and very thorough, the coverage might not be the easiest to understand because it aims to be complete, not necessarily accessible. You might want to try some other compiler texts before opening up the Dragon book. Here are a few free books, which have pretty good introductory coverage, IMHO:

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

http://www.diku.dk/~torbenm/Basics/

Some articles on the implementation of regular expressions (automated lexical analysis usually uses regular expressions)

http://swtch.com/~rsc/regexp/

I hope that helps. Good luck.

128

answered Sep 22 '22 11:09

michiakig

Related questions
                            
                                How to convert POJO to Map and vice versa in Java?
                            
                                Finding an element in an array in Java
                            
                                When you make an ArrayList without specifying the object type, does it create it automatically when you add the first object?
                            
                                R.raw.anything cannot be resolved [duplicate]
                            
                                How do I convert my Eclipse project to an earlier Java version?
                            
                                When a TrustManagerFactory is not a TrustManagerFactory (Java)
                            
                                Class.forName() throws ClassNotFoundException
                            
                                Jersey Request Filter only on certain URI
                            
                                Multi language database, with default fallback
                            
                                How can I find an overlap between two given ranges? [closed]
                            
                                Check if element is clickable in Selenium Java
                            
                                How to represent an empty InputStream
                            
                                Get an integer array from an xml resource in Android program
                            
                                Java generics: Collections.max() signature and Comparator
                            
                                How to read properties file in web application? [duplicate]
                            
                                how to get keys in Map with same sequence as they were inserted
                            
                                How to generate java classes from WSDL file
                            
                                Java String ReplaceAll method giving illegal repetition error?
                            
                                How to add Distinct in Hibernate Criteria
                            
                                Error: "schemaLocation value *** must have even number of URI's." on namespaces in spring dispatcher

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Making a lexical Analyzer

Tags:

java

lexical

analyzer

KLoverated

People also ask

1 Answers

michiakig

Recent Activity

Donate For Us