Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string with multiple separators from an array (Python)

Given an array of separators:

columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]

and a string where some columns were left blank (and there is random white space):

input = "Name:      JohnID:123:45Date:  8/2/17Building:Room:Notes:  i love notes"

How can I get this:

["John", "123:45", "8/2/17", "", "", "i love notes"]

I've tried simply removing the substrings to see where I can go from there but I'm still stuck

import re
input = re.sub(r'|'.join(map(re.escape, columns)), "", input)
like image 450
almino Avatar asked Jan 29 '23 21:01

almino


2 Answers

use the list to generate a regular expression by inserting (.*) in between, then use strip to remove spaces:

import re

columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
s = "Name:      JohnID:123:45Date:  8/2/17Building:Room:Notes:  i love notes"

result = [x.strip() for x in re.match("".join(map("{}(.*)".format,columns)),s).groups()]

print(result)

yields:

['John', '123:45', '8/2/17', '', '', 'i love notes']

the strip part can be handled by the regular expression at the expense of a more complex regex, but simpler overall expression:

result = re.match("".join(map("{}\s*(.*)\s*".format,columns)),s).groups()

more complex: if field data contains regex special chars, we have to escape them (not the case here):

result = re.match("".join(["{}\s*(.*)\s*".format(re.escape(x)) for x in columns]),s).groups()
like image 197
Jean-François Fabre Avatar answered Feb 02 '23 10:02

Jean-François Fabre


How about using re.split?

>>> import re
>>> columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
>>> i = "Name:      JohnID:123:45Date:  8/2/17Building:Room:Notes:  i love notes"
>>> re.split('|'.join(map(re.escape, columns)), i)
['', '      John', '123:45', '  8/2/17', '', '', '  i love notes']

To get rid of the whitespace, split on whitespace too:

>>> re.split(r'\s*' + (r'\s*|\s*'.join(map(re.escape, columns))) + r'\s*', i.strip())
['', 'John', '123:45', '8/2/17', '', '', '  i love notes']
like image 21
Artyer Avatar answered Feb 02 '23 09:02

Artyer