Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python tokenize sentence with optional key/val pairs

I'm trying to parse a sentence (or line of text) where you have a sentence and optionally followed some key/val pairs on the same line. Not only are the key/value pairs optional, they are dynamic. I'm looking for a result to be something like:

Input:

"There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

Output:

Values = {'theSentence' : "There was a cow at home.",
          'home' : "mary",
          'cowname' : "betsy",
          'date'= "10-jan-2013"
         }

Input:

"Mike ordered a large hamburger. lastname=Smith store=burgerville"

Output:

Values = {'theSentence' : "Mike ordered a large hamburger.",
          'lastname' : "Smith",
          'store' : "burgerville"
         }

Input:

"Sam is nice."

Output:

Values = {'theSentence' : "Sam is nice."}

Thanks for any input/direction. I know the sentences appear that this is a homework problem, but I'm just a python newbie. I know it's probably a regex solution, but I'm not the best regarding regex.

like image 729
tazzytazzy Avatar asked Jul 22 '13 18:07

tazzytazzy


2 Answers

I'd use re.sub:

import re

s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"

d = {}

def add(m):
    d[m.group(1)] = m.group(2)

s = re.sub(r'(\w+)=(\S+)', add, s)
d['theSentence'] = s.strip()

print d

Here's more compact version if you prefer:

d = {}
d['theSentence'] = re.sub(r'(\w+)=(\S+)',
    lambda m: d.setdefault(m.group(1), m.group(2)) and '',
    s).strip()

Or, maybe, findall is a better option:

rx = '(\w+)=(\S+)|(\S.+?)(?=\w+=|$)'
d = {
    a or 'theSentence': (b or c).strip()
    for a, b, c in re.findall(rx, s)
}
print d
like image 162
georg Avatar answered Oct 19 '22 22:10

georg


If your sentence is guaranteed to end on ., then, you could follow the following approach.

>>> testList = inputString.split('.')
>>> Values['theSentence'] = testList[0]+'.'

For the rest of the values, just do.

>>> for elem in testList[1].split():
        key, val = elem.split('=')
        Values[key] = val

Giving you a Values like so

>>> Values
{'date': '10-jan-2013', 'home': 'mary', 'cowname': 'betsy', 'theSentence': 'There was a cow at home.'}
>>> Values2
{'lastname': 'Smith', 'theSentence': 'Mike ordered a large hamburger.', 'store': 'burgerville'}
>>> Values3
{'theSentence': 'Sam is nice.'}
like image 35
Sukrit Kalra Avatar answered Oct 19 '22 22:10

Sukrit Kalra