Parsing in Emacs Lisp

Question

I'm writing a parser in Emacs Lisp. It's a parser for text files looking like this:

rule:
  int: 1, 2, 3, ...
  string: and, or, then, when
  text:
  ----------
  Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pellentesque
  in tellus. In pharetra consequat augue. In congue. Curabitur
  pellentesque iaculis eros. Proin magna odio, posuere sed, commodo nec,
  varius nec, tortor.
  ----------
  more: ...

rule:
  ...

I don't really care about the key (int, string, ...). I want the value. So for the file above int has value "1, 2, 3, ...", string "and, or, then, when" and text "Lorem ..." (excluding the dashes).

I'm thinking about two different solutions, but I don't which one to use. Should I:

create a simple parser that loops through all lines and for each line matches it with some regex and then group the parts I want out?
do a more sophisticated parser with a lexer and a parser?

Right now the files are quite simple and I guess I don't need to do something as advance as the second option. But these files may get a bit more complicated, so I want to make it easy to extend.

How would you solve this?

Dale Hagglund · Accepted Answer

Are you already familiar with recursive descent parsers? They're relatively easy to write by hand in your favourite programming language, which would include Emacs Lisp. For very simple parsing, you can often get by with looking-at and search-forward. These would also form the basis of any tokenizing routines that would be called by your recursive descent parser, or any other style of parser.

[11 Feb 2009] I added an example recursive descent parser in emacs lisp below. It parses simple arithmetic expressions including addition, subtraction, multiplication, division, exponentiation, and parenthesized sub-expressions. Right now, it assumes all tokens are in the global variable *tokens*, but if you modify gettok and peektok as necessary you can have them walk through a buffer. To use it as is, just try out the following:

(setq *token* '( 3 ^ 5 ^ 7 + 5 * 3 + 7 / 11))
(rdh/expr)
=> (+ (+ (^ 3 (^ 5 7)) (* 5 3)) (/ 7 11))

The parsing code follows.

(defun gettok ()
  (and *token* (pop *token*)))
(defun peektok ()
  (and *token* (car *token*)))

(defun rdh/expr ()
  (rdh/expr-tail (rdh/factor)))

(defun rdh/expr-tail (expr)
  (let ((tok (peektok)))
    (cond ((or (null tok)
           (equal tok ")"))
       expr)
      ((member tok '(+ -))
       (gettok)
       (let ((fac (rdh/factor)))
         (rdh/expr-tail (list tok expr fac))))
      (t (error "bad expr")))))

(defun rdh/factor ()
  (rdh/factor-tail (rdh/term)))

(defun rdh/factor-tail (fac)
  (let ((tok (peektok)))
    (cond ((or (null tok)
           (member tok '(")" + -)))
       fac)
      ((member tok '(* /))
       (gettok)
       (let ((term (rdh/term)))
         (rdh/factor-tail (list tok fac term))))
      (t (error "bad factor")))))

(defun rdh/term ()
  (let* ((prim (rdh/prim))
         (tok (peektok)))
    (cond ((or (null tok)
               (member tok '(")" + - / *)))
           prim)
          ((equal tok '^)
           (gettok)
           (list tok prim (rdh/term)))
          (t (error "bad term")))))

(defun rdh/prim ()
  (let ((tok (gettok)))
    (cond ((numberp tok) tok)
      ((equal tok "(")
       (let* ((expr (rdh/expr))
          (tok (peektok)))
         (if (not (equal tok ")"))
         (error "bad parenthesized expr")
           (gettok)
           expr)))
      (t (error "bad prim")))))

Alex Ott · Answer

for parser stuff look to the Semantic library from CEDET project

Parsing in Emacs Lisp

Tags:

emacs

parsing

elisp

rejeep

2 Answers

Dale Hagglund

Alex Ott

Recent Activity

Donate For Us

Parsing in Emacs Lisp

Tags:

emacs

parsing

elisp

rejeep

2 Answers

Dale Hagglund

Alex Ott

Related questions

Recent Activity

Donate For Us