Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PLY: quickly parsing long lists of items?

I'm working with a fairly simple parser in PLY, and one of my rules takes on the following form:

def p_things(p):
    '''
    things : thing things
    things : thing
    '''
    p[0] = [p[1]]
    if len(p) == 3:
        p[0] += p[2]

Input files are generally simple lists of things, and so the parsing itself is not complex. Some of my input files are very large, however (exceeding 100,000 lines fairly regularly, and over 1,000,000 in extreme cases). In profiling (via cProfile and pstats), the bulk of the runtime is taken up by repeated calls to p_things - presumably, one call for each item in a things list.

Is there a way to reduce this time, or a more efficient way to structure this rule? Most answers I've seen so far (and the canonical compilers info I've found) have listed this method as the generally accepted way to construct a list of parseable items, no matter the length.

like image 849
Tim Avatar asked Jun 20 '11 20:06

Tim


1 Answers

Turns out I'm forgetting some of my basic compilers theory. PLY is a LALR(1) parser, and so it's better to write the rule as:

def p_things(p):
    '''
    things : things thing
    things : thing
    '''
    if len(p) == 2:
        p[0] = [p[1]]
    else:
        p[0] = p[1]
        p[0].append(p[2])

Though it may look more verbose, there's actually a significant improvement - somewhere in either PLY or Python, the parser was able to apply some optimization on the left-recursive form. I've seen performance drop from exponential to linear on my larger input files; one sample, with over a million items in the things list, ran in under 20% of the time.

like image 177
Tim Avatar answered Oct 28 '22 10:10

Tim