Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: splitting a complex string including parentheses and |

In a test file, I have records in the form

DATA(VALUE1|VALUE2||VALUE4)

and so on.

I'd like to split this string in two passes, the first yielding "DATA", and the second giving me what's inside the parentheses, split at the "|". The second part looks trivial, but so far my attempts at the first were ugly.

I'm more inclined towards regex than parsing as lines are quite simple in the end.

like image 821
Einar Avatar asked Apr 08 '13 13:04

Einar


People also ask

How do you split a complex string in Python?

Splitting a string in Python is pretty simple. You can achieve this using Python's built-in "split()" function. The split() method in Python separates each word in a string using a comma, turning it into a list of words.

How do you split a string into multiple strings in Python?

In Python you can split a string with the split() method. It breaks up a string (based on the given separator) and returns a list of strings. To split a string, we use the method . split() .

Can you split () by a newline Python?

split() method splits the string by new line character and returns a list of strings. The string can also contain \n characters in the string as shown below, instead of a multi-line string with triple quotes.


2 Answers

Another suggestion:

>>> s = "DATA(VALUE1|VALUE2||VALUE4)"
>>> import re
>>> matches = re.findall("[^()]+", s)
>>> matches
['DATA', 'VALUE1|VALUE2||VALUE4']
>>> result = {matches[0]: matches[1].split("|")}
>>> result
{'DATA': ['VALUE1', 'VALUE2', '', 'VALUE4']}
like image 62
Tim Pietzcker Avatar answered Nov 14 '22 22:11

Tim Pietzcker


You could do it in one pass with re.split:

In [10]: import re

In [11]: line = 'DATA(VALUE1|VALUE2||VALUE4)'

In [12]: re.split(r'[(|)]', line)
Out[12]: ['DATA', 'VALUE1', 'VALUE2', '', 'VALUE4', '']

And extract the data and values like this:

In [13]: parts = re.split(r'[(|)]', line)

In [14]: data = parts[0]

In [15]: values = parts[1:-1]

In [16]: values
Out[16]: ['VALUE1', 'VALUE2', '', 'VALUE4']
like image 39
unutbu Avatar answered Nov 15 '22 00:11

unutbu