In a test file, I have records in the form
DATA(VALUE1|VALUE2||VALUE4)
and so on.
I'd like to split this string in two passes, the first yielding "DATA", and the second giving me what's inside the parentheses, split at the "|". The second part looks trivial, but so far my attempts at the first were ugly.
I'm more inclined towards regex than parsing as lines are quite simple in the end.
Splitting a string in Python is pretty simple. You can achieve this using Python's built-in "split()" function. The split() method in Python separates each word in a string using a comma, turning it into a list of words.
In Python you can split a string with the split() method. It breaks up a string (based on the given separator) and returns a list of strings. To split a string, we use the method . split() .
split() method splits the string by new line character and returns a list of strings. The string can also contain \n characters in the string as shown below, instead of a multi-line string with triple quotes.
Another suggestion:
>>> s = "DATA(VALUE1|VALUE2||VALUE4)"
>>> import re
>>> matches = re.findall("[^()]+", s)
>>> matches
['DATA', 'VALUE1|VALUE2||VALUE4']
>>> result = {matches[0]: matches[1].split("|")}
>>> result
{'DATA': ['VALUE1', 'VALUE2', '', 'VALUE4']}
You could do it in one pass with re.split
:
In [10]: import re
In [11]: line = 'DATA(VALUE1|VALUE2||VALUE4)'
In [12]: re.split(r'[(|)]', line)
Out[12]: ['DATA', 'VALUE1', 'VALUE2', '', 'VALUE4', '']
And extract the data and values like this:
In [13]: parts = re.split(r'[(|)]', line)
In [14]: data = parts[0]
In [15]: values = parts[1:-1]
In [16]: values
Out[16]: ['VALUE1', 'VALUE2', '', 'VALUE4']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With