Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyparsing delimited list only returns first element

Here is my code :

l = "1.3E-2   2.5E+1"
parser = Word(alphanums + '+-.')
grammar = delimitedList(parser,delim='\t ')
print(grammar.parseString(l))

It returns :

['1.3E-2']

Obiously, I want all both values, not a single one, any idea what is going on ?

like image 885
Overdrivr Avatar asked Dec 08 '22 05:12

Overdrivr


2 Answers

As @dawg explains, delimitedList is intended for cases where you have an expression with separating non-whitespace delimiters, typically commas. Pyparsing implicitly skips over whitespace, so in the pyparsing world, what you are really seeing is not a delimitedList, but OneOrMore(realnumber). Also, parseString internally calls str.expandtabs on the provided input string, unless you use the parseWithTabs=True argument. Expanding tabs to spaces helps preserve columnar alignment of data when it is in tabular form, and when I originally wrote pyparsing, this was a prevalent use case.

If you have control over this data, then you might want to use a different delimiter than <TAB>, perhaps commas or semicolons. If you are stuck with this format, but determined to use pyparsing, then use OneOrMore.

As you move forward, you will also want to be more precise about the expressions you define and the variable names that you use. The name "parser" is not very informative, and the pattern of Word(alphanums+'+-.') will match a lot of things besides valid real values in scientific notation. I understand if you are just trying to get anything working, this is a reasonable first cut, and you can come back and tune it once you get something going. If in fact you are going to be parsing real numbers, here is an expression that might be useful:

realnum = Regex(r'[+-]?\d+\.\d*([eE][+-]?\d+)?').setParseAction(lambda t: float(t[0]))

Then you can define your grammar as "OneOrMore(realnum)", which is also a lot more self-explanatory. And the parse action will convert your strings to floats at parse time, which will save you step later when actually working with the parsed values.

Good luck!

like image 151
PaulMcG Avatar answered Dec 21 '22 02:12

PaulMcG


Works if you switch to raw strings:

l = r"1.3E-2\t2.5E+1"
parser = Word(alphanums + '+-.')
grammar = delimitedList(parser, delim=r'\t')
print(grammar.parseString(l))

Prints:

['1.3E-2', '2.5E+1']

In general, delimitedList works with something like PDPDP where P is the parse target and D is the delimter or delimiting sequence.

You have delim='\t '. That specifically is a delimiter of 1 tab followed by 1 space; it is not either tab or space.

like image 32
dawg Avatar answered Dec 21 '22 02:12

dawg