I have my data as below
string = ' streptococcus 7120 "File being analysed" rd873 '
I tried to split the line using n=string.split() which gives the below result:
[streptococcus,7120,File,being,analysed,rd873]
I would like to split the string ignoring white spaces in " "
# output expected :
[streptococcus,7120,File being analysed,rd873]
Use re.findall with a suitable regex. I'm not sure what your error cases look like (what if there are an odd number of quotes?), but:
filter(None, it.chain(*re.findall(r'"([^"]*?)"|(\S+)', ' streptococcus 7120 "File being analysed" rd873 "hello!" hi')))
> ['streptococcus',
'7120',
'File being analysed',
'rd873',
'hello!',
'hi']
looks right.
You want shlex.split, which gives you the behavior you want with the quotes.
import shlex
string = ' streptococcus 7120 "File being analysed" rd873 '
items = shlex.split(string)
This won't strip extra spaces embedded in the strings, but you can do that with a list comprehension:
items = [" ".join(x.split()) for x in shlex.split(string)]
Look, ma, no regex!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With