I'm trying to perform a string split on a set of somewhat irregular data that looks something like:
\n\tName: John Smith \n\t Home: Anytown USA \n\t Phone: 555-555-555 \n\t Other Home: Somewhere Else \n\t Notes: Other data \n\tName: Jane Smith \n\t Misc: Data with spaces
I'd like to convert this into a tuple/dict where I later will split on the colon :
, but first I need to get rid of all the extra whitespace. I'm guessing a regex is the best way but I can't seem to get one that works, below is my attempt.
data_string.split('\n\t *')
Just use .strip(), it removes all whitespace for you, including tabs and newlines, while splitting. The splitting itself can then be done with data_string.splitlines()
:
[s.strip() for s in data_string.splitlines()]
Output:
>>> [s.strip() for s in data_string.splitlines()] ['Name: John Smith', 'Home: Anytown USA', 'Phone: 555-555-555', 'Other Home: Somewhere Else', 'Notes: Other data', 'Name: Jane Smith', 'Misc: Data with spaces']
You can even inline the splitting on :
as well now:
>>> [s.strip().split(': ') for s in data_string.splitlines()] [['Name', 'John Smith'], ['Home', 'Anytown USA'], ['Phone', '555-555-555'], ['Other Home', 'Somewhere Else'], ['Notes', 'Other data'], ['Name', 'Jane Smith'], ['Misc', 'Data with spaces']]
>>> for line in s.splitlines(): ... line = line.strip() ... if not line:continue ... ary.append(line.split(":")) ... >>> ary [['Name', ' John Smith'], ['Home', ' Anytown USA'], ['Misc', ' Data with spaces' ]] >>> dict(ary) {'Home': ' Anytown USA', 'Misc': ' Data with spaces', 'Name': ' John Smith'} >>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With