I have this format of string
2013-06-05T11:01:02.955 LASTNAME=Jone FIRSTNAME=Jason PERSONNELID=salalm QID=231412 READER_NAME="CAZ.1 LOBBY LEFT TURNSTYLE OUT" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370480141.000 REGION=UTAH
some of them looks like this
2013-06-05T11:15:48.670 LASTNAME=Ga FIRSTNAME="Je " PERSONNELID=jega QID=Q10138202 READER_NAME="CAZ.1 ELEVATOR LOBBY DBL GLASS" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370481333.000 REGION=UTAH
I want to extract value of PERSONNELID,REGION,ACCESS_TYPE,EVENT_TIME_UTC
I was going to use split(" ") however READER_NAME and ACCESS_TYPE value has bunch of spaces Can I convert to JSON and search by key
What is the way to extract those strings.
Thank you in advance
One hack I've found useful in the past is to use shlex.split
:
>>> s = '2013-06-05T11:01:02.955 LASTNAME=Jone FIRSTNAME=Jason PERSONNELID=salalm QID=231412 READER_NAME="CAZ.1 LOBBY LEFT TURNSTYLE OUT" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370480141.000 REGION=UTAH'
>>> split = shlex.split(s)
>>> split
['2013-06-05T11:01:02.955', 'LASTNAME=Jone', 'FIRSTNAME=Jason',
'PERSONNELID=salalm', 'QID=231412', 'READER_NAME=CAZ.1 LOBBY LEFT TURNSTYLE OUT',
'ACCESS_TYPE=Access Granted', 'EVENT_TIME_UTC=1370480141.000', 'REGION=UTAH']
And then we can turn this into a dictionary:
>>> parsed = dict(k.split("=", 1) for k in split if '=' in k)
>>> parsed
{'EVENT_TIME_UTC': '1370480141.000', 'FIRSTNAME': 'Jason',
'LASTNAME': 'Jone', 'REGION': 'UTAH', 'ACCESS_TYPE': 'Access Granted',
'PERSONNELID': 'salalm', 'QID': '231412',
'READER_NAME': 'CAZ.1 LOBBY LEFT TURNSTYLE OUT'}
As @abarnert points out, you can keep more of the information around if you want:
>>> dict(k.partition('=')[::2] for k in split)
{'2013-06-05T11:01:02.955': '', 'EVENT_TIME_UTC': '1370480141.000', 'FIRSTNAME': 'Jason', 'LASTNAME': 'Jone', 'REGION': 'UTAH', 'ACCESS_TYPE': 'Access Granted', 'PERSONNELID': 'salalm', 'QID': '231412', 'READER_NAME': 'CAZ.1 LOBBY LEFT TURNSTYLE OUT'}
Et cetera. The key point, as he nicely put it, is that the syntax you've shown looks a lot like minimal shell syntax. OTOH, if there are violations of the pattern that you've shown elsewhere, you might want to fall back to writing a custom parser. The shlex
approach is handy when it applies but isn't as robust as you might want.
Looking for an existing parser is a good idea. If you can find a format that already describes your data, or that you can trivially convert your data into, you win.
In this case, converting to JSON seems like it'll be as much work as parsing in the first place.
But you're just looking to split into simple value
and name=value
components, where the value
part can be quoted… those are the same rules as minimal shell syntax. So, shlex
will do it for you:
>>> import shlex
>>> shlex.split('2013-06-05T11:01:02.955 LASTNAME=Jone FIRSTNAME=Jason PERSONNELID=salalm QID=231412 READER_NAME="CAZ.1 LOBBY LEFT TURNSTYLE OUT" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370480141.000 REGION=UTAH')
['2013-06-05T11:01:02.955',
'LASTNAME=Jone',
'FIRSTNAME=Jason',
'PERSONNELID=salalm',
'QID=231412',
'READER_NAME=CAZ.1 LOBBY LEFT TURNSTYLE OUT',
'ACCESS_TYPE=Access Granted',
'EVENT_TIME_UTC=1370480141.000',
'REGION=UTAH']
You will still need to separate each name=value
pair out into name and value components, but that's just namevalue.split('=', 1)
. But it's pretty much implicit that you need to do that separately given that you've got some elements that aren't name-value pairs (2013-06-05T11:01:02.955
).
Of course you can always choose to treat them as name-value pairs with empty values:
>>> dict(namevalue.partition('=')[::2] for namevalue in shlex.split(s))
{'2013-06-05T11:01:02.955': '',
'ACCESS_TYPE': 'Access Granted',
'EVENT_TIME_UTC': '1370480141.000',
'FIRSTNAME': 'Jason',
'LASTNAME': 'Jone',
'PERSONNELID': 'salalm',
'QID': '231412',
'READER_NAME': 'CAZ.1 LOBBY LEFT TURNSTYLE OUT',
'REGION': 'UTAH'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With