Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python, how do I split on multiple delimiters and keep only one in my output list?

a very green python user here, so go easy on me and the docs haven't helped me understand what I'm missing. Similar to RE split multiple arguments | (or) returns none python, I need to split a string on multiple delimiters. The above question only allows either keeping none or keeping both delimiters - I need to keep only one of them. Note that the above question was from 2012, so likely a much earlier version of Python that 3.6, which I'm using.

My data:

line = 'APPLE,ORANGE CHERRY APPLE'

I want a list returned that looks like:

['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']

I need to keep the comma so I can remove duplicate components later. I have that part working if I could just get the list created properly. Here's what I've got.

list = re.split(r'\s|(,)',line)
print(list)

My logic here is split on space and comma but only keep the comma - makes sense to me. Nope:

['APPLE', ',', 'ORANGE', None, 'CHERRY', None, 'APPLE']

I've also tried what is mentioned in the above linked question, to put the entire group in a capture:

re.split(r'(\s|(,))',line)

Nope again:

['APPLE', ',', ',', 'ORANGE', ' ', None, 'CHERRY', ' ', None, 'APPLE']

What am I missing? I know it's related to how my capture groups are set up but I can't figure it out. Thanks in advance!

like image 407
J-T Avatar asked Nov 24 '25 18:11

J-T


1 Answers

I suggest using a matching approach with

re.findall(r'[^,\s]+|,', line)

See the regex demo. The [^,\s]+|, pattern matches

  • [^,\s]+ - one or more chars other than a comma and whitespace
  • | - or
  • , - a comma.

See a Python demo:

import re
line = 'APPLE,ORANGE CHERRY APPLE'
l = re.findall(r'[^,\s]+|,', line)
print(l) # => ['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
like image 128
Wiktor Stribiżew Avatar answered Nov 26 '25 07:11

Wiktor Stribiżew