Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex split string while keeping delimiter with value

Tags:

python

regex

I'm trying to parse a text file with name:value elements in it into lists with "name:value"... Here's a twist: The values will sometimes be multiple words or even multiple lines and the delimiters are not a fixed set of words. Here's an example of what I'm trying to work with...

listing="price:44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!

What I want to return is...

["price:44.55", "name:John Doe", "title:Super Widget", "description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

Here's what I've tried so far...

details = re.findall(r'[\w]+:.*', post, re.DOTALL)
["price:", "44.55 name:John Doe title:Super Widget description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

Not what I want. Or...

details = re.findall(r'[\w]+:.*?', post, re.DOTALL)
["price:", "name:", "title:", "description:"]

Not what I want. Or...

details = re.split(r'([\w]+:)', post)
["", "price:", "44.55", "name:", "John Doe", "title:", "Super Widget", "description:", "This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!"]

which is closer, but still no dice. Also, I can deal with an empty list item. So, basically, my question is how do you keep the delimiter with the values on a re.split() or how do you keep re.findall() from either being too greedy or too stingy?

Thanks ahead of time for reading!

like image 592
Captain Cornfield Keyboard Avatar asked May 01 '26 16:05

Captain Cornfield Keyboard


1 Answers

Use a look-ahead assertion:

>>> re.split(r'\s(?=\w+:)', post)
['price:44.55',
 'name:John Doe',
 'title:Super Widget',
 'description:This widget slices, dices, and drives your kids to soccer practice\r\nIt even comes with Super Widget Mini!']

Of course, it would still fail if there are some words followed immediately by a colon in your values.

like image 169
Pavel Anossov Avatar answered May 03 '26 05:05

Pavel Anossov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!