Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to separate string from unformed string

Tags:

python

I have this format of string

 2013-06-05T11:01:02.955 LASTNAME=Jone FIRSTNAME=Jason PERSONNELID=salalm QID=231412 READER_NAME="CAZ.1 LOBBY LEFT TURNSTYLE OUT" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370480141.000 REGION=UTAH

some of them looks like this

 2013-06-05T11:15:48.670 LASTNAME=Ga FIRSTNAME="Je " PERSONNELID=jega QID=Q10138202 READER_NAME="CAZ.1 ELEVATOR LOBBY DBL GLASS" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370481333.000 REGION=UTAH

I want to extract value of PERSONNELID,REGION,ACCESS_TYPE,EVENT_TIME_UTC

I was going to use split(" ") however READER_NAME and ACCESS_TYPE value has bunch of spaces Can I convert to JSON and search by key

What is the way to extract those strings.

Thank you in advance

like image 495
user1413449 Avatar asked Dec 02 '22 23:12

user1413449


2 Answers

One hack I've found useful in the past is to use shlex.split:

>>> s = '2013-06-05T11:01:02.955 LASTNAME=Jone FIRSTNAME=Jason PERSONNELID=salalm QID=231412 READER_NAME="CAZ.1 LOBBY LEFT TURNSTYLE OUT" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370480141.000 REGION=UTAH'
>>> split = shlex.split(s)
>>> split
['2013-06-05T11:01:02.955', 'LASTNAME=Jone', 'FIRSTNAME=Jason', 
'PERSONNELID=salalm', 'QID=231412', 'READER_NAME=CAZ.1 LOBBY LEFT TURNSTYLE OUT',
'ACCESS_TYPE=Access Granted', 'EVENT_TIME_UTC=1370480141.000', 'REGION=UTAH']

And then we can turn this into a dictionary:

>>> parsed = dict(k.split("=", 1) for k in split if '=' in k)
>>> parsed
{'EVENT_TIME_UTC': '1370480141.000', 'FIRSTNAME': 'Jason', 
'LASTNAME': 'Jone', 'REGION': 'UTAH', 'ACCESS_TYPE': 'Access Granted', 
'PERSONNELID': 'salalm', 'QID': '231412', 
'READER_NAME': 'CAZ.1 LOBBY LEFT TURNSTYLE OUT'}

As @abarnert points out, you can keep more of the information around if you want:

>>> dict(k.partition('=')[::2] for k in split)
{'2013-06-05T11:01:02.955': '', 'EVENT_TIME_UTC': '1370480141.000', 'FIRSTNAME': 'Jason', 'LASTNAME': 'Jone', 'REGION': 'UTAH', 'ACCESS_TYPE': 'Access Granted', 'PERSONNELID': 'salalm', 'QID': '231412', 'READER_NAME': 'CAZ.1 LOBBY LEFT TURNSTYLE OUT'}

Et cetera. The key point, as he nicely put it, is that the syntax you've shown looks a lot like minimal shell syntax. OTOH, if there are violations of the pattern that you've shown elsewhere, you might want to fall back to writing a custom parser. The shlex approach is handy when it applies but isn't as robust as you might want.

like image 165
DSM Avatar answered Dec 09 '22 16:12

DSM


Looking for an existing parser is a good idea. If you can find a format that already describes your data, or that you can trivially convert your data into, you win.

In this case, converting to JSON seems like it'll be as much work as parsing in the first place.

But you're just looking to split into simple value and name=value components, where the value part can be quoted… those are the same rules as minimal shell syntax. So, shlex will do it for you:

>>> import shlex
>>> shlex.split('2013-06-05T11:01:02.955 LASTNAME=Jone FIRSTNAME=Jason PERSONNELID=salalm QID=231412 READER_NAME="CAZ.1 LOBBY LEFT TURNSTYLE OUT" ACCESS_TYPE="Access Granted" EVENT_TIME_UTC=1370480141.000 REGION=UTAH')
['2013-06-05T11:01:02.955',
 'LASTNAME=Jone',
 'FIRSTNAME=Jason',
 'PERSONNELID=salalm',
 'QID=231412',
 'READER_NAME=CAZ.1 LOBBY LEFT TURNSTYLE OUT',
 'ACCESS_TYPE=Access Granted',
 'EVENT_TIME_UTC=1370480141.000',
 'REGION=UTAH']

You will still need to separate each name=value pair out into name and value components, but that's just namevalue.split('=', 1). But it's pretty much implicit that you need to do that separately given that you've got some elements that aren't name-value pairs (2013-06-05T11:01:02.955).

Of course you can always choose to treat them as name-value pairs with empty values:

>>> dict(namevalue.partition('=')[::2] for namevalue in shlex.split(s))
{'2013-06-05T11:01:02.955': '',
 'ACCESS_TYPE': 'Access Granted',
 'EVENT_TIME_UTC': '1370480141.000',
 'FIRSTNAME': 'Jason',
 'LASTNAME': 'Jone',
 'PERSONNELID': 'salalm',
 'QID': '231412',
 'READER_NAME': 'CAZ.1 LOBBY LEFT TURNSTYLE OUT',
 'REGION': 'UTAH'}
like image 43
abarnert Avatar answered Dec 09 '22 15:12

abarnert