Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to speed up instaparse?

I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar

<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'

calling like so

(def parser
  (insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))

and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?

like image 289
rwallace Avatar asked Oct 18 '22 11:10

rwallace


1 Answers

The grammar is wrong. It can't be satisfied.

  • Every file ends with a clause.
  • Every clause ends with a '0'.
  • The literal in the clause, being a greedy reg-exp,will eat the final '0'.

Conclusion: No clause will ever be found.

For example ...

=> (parser "60")
Parse error at line 1, column 3:
60
  ^
Expected one of:
"0"
#"\s+"
#"-\d+"
#"[1-9]\d*"

We can parse a literal

=> (parser "60" :start :literal)
("60")

... but not a clause

=> (parser "60" :start :clause)
Parse error at line 1, column 3:
60
  ^
Expected one of:
"0" (followed by end-of-string)
#"\s+"
#"-\d+"
#"[1-9]\d*"

Why is it so slow?

If there is a comment:

  • it can swallow the whole file;
  • or be broken at any 'c' character into successive comments;
  • or terminate at any point after the initial 'c'.

This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for literal that Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.


I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.

May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.

like image 177
Thumbnail Avatar answered Oct 21 '22 04:10

Thumbnail