Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyparsing ambiguity

I'm trying to parse some text using PyParser. The problem is that I have names that can contain white spaces. So my input might look like this. First, a list of names:

Joe
bob
Jimmy X
grjiaer-rreaijgr Y

Then, things they do:

Joe A
bob B
Jimmy X C

the problem of course is that a thing they do can be the same as the end of the name:

Jimmy X X
grjiaer-rreaijgr Y Y

How can I create a parser for the action lines? The output of parsing Joe A should be [Joe, A]. The output of parsing Jimmy X C should be [Jimmy X, C], of Jimmy X X - [Jimmy X, X]. That is, [name, action] pairs.

If I create my name parser naively, meaning something like OneOrMore(RegEx("\S*")), then it will match the entire line giving me [Jimmy X X] followed by a parsing error for not seeing an action (since it was already consumed by the name parser).

NOTE: Sorry for the ambiguous phrasing earlier that made this look like an NLP question.

like image 654
Claudiu Avatar asked Nov 20 '25 18:11

Claudiu


1 Answers

You pretty much need more than a simple parser. Parsers use the symbols in a string to define which pieces of the string represent different elements of a grammar. This is why FM asked for some clue to indicate how you know what part is the name and what part is the rest of the sentence. If you could say that names are made up of one or more capitalized words, then the parser would know when the name stops and the rest of the sentence starts.

But a name like "jimmy foo decides"? How can the parser know just by looking at the symbols in "decides" whether "decides" is or is not part of the name? Even a human reading your "jimmy foo decides decides to eat" sentence would have some trouble determining where the name starts or stops, and whether this was some sort of typo.

If your input is really this unpredictable, then you need to use a tool such as the NLTK (Natural Language Toolkit). I've not used it myself, but it approaches this problem from the standpoint of parsing sentences in a language, as opposed to trying to parse structured data or mathematical formats.

I would not recommend pyparsing for this kind of language interpretation.

like image 125
PaulMcG Avatar answered Nov 23 '25 07:11

PaulMcG



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!