What is the best way in python to parse these results? I have tried regex but can't get it to work. I am looking for a dictionary of title, author etc as keys.
@article{perry2000epidemiological,
title={An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence Study},
author={Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and Smith, Nigel and Clarke, Michael and Jagger, Carol and others},
journal={Journal of public health},
volume={22},
number={3},
pages={427--434},
year={2000},
publisher={Oxford University Press}
}
This looks like a citation format. You could parse it like this:
>>> import re
>>> kv = re.compile(r'\b(?P<key>\w+)={(?P<value>[^}]+)}')
>>> citation = """
... @article{perry2000epidemiological,
... title={An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence
... Study},
... author={Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and
... Smith, Nigel and Clarke, Michael and Jagger, Carol and others},
... journal={Journal of public health},
... volume={22},
... number={3},
... pages={427--434},
... year={2000},
... publisher={Oxford University Press}
... }
... """
>>> dict(kv.findall(citation))
{'author': 'Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and Smith, Nigel and Clarke, Michael and Jagger, Carol and others',
'journal': 'Journal of public health',
'number': '3',
'pages': '427--434',
'publisher': 'Oxford University Press',
'title': 'An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence Study',
'volume': '22',
'year': '2000'}
The regex uses two named capturing groups (mainly just to visually denote what's what).
[^}]
conveniently as long as you don't expect to have "nested" curly brackets. In other words, the values are just one or more of any characters that aren't curly brackets, inside of curly brackets.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With