I have the following string in a report file:
"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"
I would like to turn it into a Bunch() object or a dict, so that I can access the information inside (via either my_var.conditions or my_var["conditions"]).
This works very well with eval():
eval("Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])")
however I would like to avoid using that.
I have tried to write a couple of string substitutions so that I convert it to a dict syntax and then parse it with json.loads() but that looks very very hackish, and will break as soon as I encounter any new fields in future strings; e.g.:
"{"+"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"[1:-1]+"}".replace("conditions=","'conditions':")
You get the idea.
Do you know if there is any better way to parse this?
This pyparsing code will define a parsing expression for your Bunch declaration.
from pyparsing import (pyparsing_common, Suppress, Keyword, Forward, quotedString,
Group, delimitedList, Dict, removeQuotes, ParseResults)
# define pyparsing parser for the Bunch declaration
LBRACK,RBRACK,LPAR,RPAR,EQ = map(Suppress, "[]()=")
integer = pyparsing_common.integer
real = pyparsing_common.real
ident = pyparsing_common.identifier
# define a recursive expression for nested lists
listExpr = Forward()
listItem = real | integer | quotedString.setParseAction(removeQuotes) | Group(listExpr)
listExpr << LBRACK + delimitedList(listItem) + RBRACK
# define an expression for the Bunch declaration
BUNCH = Keyword("Bunch")
arg_defn = Group(ident + EQ + listItem)
bunch_decl = BUNCH + LPAR + Dict(delimitedList(arg_defn))("args") + RPAR
Here is that parser run against your sample input:
# run the sample input as a test
sample = """Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'],
durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"""
bb = bunch_decl.parseString(sample)
# print the parsed output as-is
print(bb)
Gives:
['Bunch', [['conditions', ['s1', 's2', 's3', 's4', 's5', 's6']],
['durations', [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]]],
['onsets', [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]]]]
With pyparsing, you can also add a parse-time callback, so that pyparsing will do the tokens->Bunch conversion for you:
# define a simple placeholder class for Bunch
class Bunch(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
def __repr__(self):
return "Bunch:(%s)" % ', '.join("%r: %s" % item for item in vars(self).items())
# add this as a parse action, and pyparsing will autoconvert the parsed data to a Bunch
bunch_decl.addParseAction(lambda t: Bunch(**t.args.asDict()))
Now the parser will give you an actual Bunch instance:
[Bunch:('durations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'],
'onsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])]
Here is my ugly piece of code, please check:
import re
import json
l = "Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"
exec('{}="{}"'.format(l[:5],l[6:-1]))
sb = re.split("=| [a-zA-Z]", Bunch)
temp = ['"{}"'.format(x) if x.isalpha() else x for x in sb ]
temp2 = ','.join(temp)
temp3 = temp2.replace('",[', '":[')
temp4 = temp3.replace(',,', ',')
temp5 = temp4.replace("\'", '"')
temp6 = """{%s}""" %(temp5)
rslt = json.loads(temp6)
Eventually, the output:
rslt
Out[12]:
{'urations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'],
'nsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]}
rslt["conditions"]
Out[13]: ['s1', 's2', 's3', 's4', 's5', 's6']
Generally, I think re is the package you need, but due to my limited experience of using it, I could apply it well here. Hope someone else will give a more elegant solution.
FYI, you said you could easily use eval to get what you want, but when I try to use it, I got TypeError: 'str' object is not callable. which Python version are you using? (I tried it on Python27 and Python33, both of them cannot work)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With