Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing org-mode files in Javascript

It's been some time now that I am trying to get myself to write a parser in Javascript for org-mode. I had no trouble at all parsing the outline (which I did in a few minutes), but parsing the actual content is far more difficult, and I'm having trouble with imbricated lists, for example.

* This is a heading
  P1 Start a paragraph here but since it is the first indentation level
the paragraph may have a lower indentation on the next line
    or a greater one for that matter.

  + LI1.1 I am beginning a list here
  + LI1.2 Here begins another list item
    which continues here
      and also here
  P2 but is broken here (this line becomes a paragraph
  outside of the first list).
  + LI2.1 P1 Second list item.
    - LI2.1.1 Inner list with a simple item
    - LI2.1.2 P1 and with an item containing several paragraphs.
      Here is the second line in the item, and now

      LI2.1.2 P2 I begin a new paragraph still in the same item. 
        The indentation can be only higher
    LI2.1 P2 but if the indentation is lower, it breaks the item, 
    (and the whole list), and this is a paragraph in the LI2.1
    list item

    - LI 2.2.1 You get the picture
  P3 Just plain text outside of the list.

(In the above example, the PX and LIX.Y are only there to show explicitly the beginning of new blocks, they would not be present in the actual document. P stand for paragraph and LI for list item. In the HTML world, PX would be the beginning of a <p> tag. The numbering are just to help keep track of the nesting and changes of list.)

I wondered about the strategy to parse this kind of significant white-space imbricated blocks, clearly I can parse line by line without any backtracking or nothing, so it must be quite simple, but for some reason I couldn't manage to do it. I tried to get inspiration from Markdown parsers, or such things that are supposed to have similar imbrication features but they appeared to me (for the ones I saw) to be very hacky, full of regexes and I hoped I could write something cleaner (org-mode "grammar" being quite huge when you come to think about it, it will grow little by little and I'd like the whole thing to be maintainable and allow to plug-in new features easily).

Can anyone with experience in parsing such things can give me some pointers?

like image 315
glmxndr Avatar asked Jun 17 '11 18:06

glmxndr


1 Answers

There is a Javascript org-mode parser available here.

like image 71
mac Avatar answered Sep 22 '22 02:09

mac