Why are there extra empty strings at the beginning and end of the list returned by re.split?

Tags:

I'm trying to take a string of ints and/or floats and create a list of floats. The string is going to have these brackets in them that need to be ignored. I'm using re.split, but if my string begins and ends with a bracket, I get extra empty strings. Why is that?

Code:

import re
x = "[1 2 3 4][2 3 4 5]"
y =  "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)

Output:

['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']

762

asked Jun 18 '15 19:06

user4794127

3 Answers

If you use re.split, then a delimiter at the beginning or end of the string causes an empty string at the beginning or end of the array in the result.

If you don't want this, use re.findall with a regex that matches every sequence NOT containing delimiters.

Example:

import re

a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))

Output:

['', '1', '2', '3', '4', '']
['1', '2', '3', '4']

As others have pointed out in their answers, this may not be the perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.

answered Oct 19 '22 17:10

Florian Winter

As a more pythonic way you can just use a list comprehension and str.isdigit() method to check of your character is digit :

>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']

And about your code first of all you need to split based on space or brackets that could be done with [\[\] ] and for get rid of empty strings that is for leading and trailing brackets you can first strip your string :

>>> y =  "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y =  "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']

You can also wrap your result with filter function and using bool function.

>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']

answered Oct 19 '22 17:10

Mazdak

You can just use filter to avoid empty results:

x = "[1 2 3 4][2 3 4 5]"

print filter(None, re.split(r'[^\d.]+', x))
# => ['1', '2', '3', '4', '2', '3', '4', '5']

answered Oct 19 '22 16:10

anubhava

Related questions
                            
                                Why are modules in PYTHONPATH not found when the containing directory is part of PYTHONPATH and file exists?
                            
                                Django Admin Using both Django Reversion and Django-Import-Export
                            
                                Django: Naive datetime while time zone support is active (sqlite)
                            
                                How to get autoincrement values for a column after uploading a Pandas dataframe to a MySQL database
                            
                                Python str view
                            
                                Python context-managed member variable?
                            
                                Flask login_user doesn't work with pytest
                            
                                Is it possible to disable syntax highlighting in Sublime REPL-tabs?
                            
                                Install python ssl module on linux without recompiling
                            
                                virtualenv IPython in Spyder not working
                            
                                How to parse a DOT file in Python
                            
                                Python decorator to automatically define __init__ variables
                            
                                Merging with empty DataFrame
                            
                                AppEngine urlfetch validate_certificate=False/None not being respected
                            
                                Installing Packages from Multiple Servers from One or More Requirements File
                            
                                Whats the best way to present a flask interface to ongoing backround task?
                            
                                How to Sort Python Objects
                            
                                Set matplotlib rectangle edge to outside of specified width?
                            
                                pip unexpectedly not installing latest version of git package with branch/commit pinning
                            
                                How do I get back the option string using argparse?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are there extra empty strings at the beginning and end of the list returned by re.split?

Tags:

python

regex