Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are there extra empty strings at the beginning and end of the list returned by re.split?

Tags:

python

regex

I'm trying to take a string of ints and/or floats and create a list of floats. The string is going to have these brackets in them that need to be ignored. I'm using re.split, but if my string begins and ends with a bracket, I get extra empty strings. Why is that?

Code:

import re
x = "[1 2 3 4][2 3 4 5]"
y =  "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)

Output:

['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']
like image 762
user4794127 Avatar asked Jun 18 '15 19:06

user4794127


People also ask

Why does split return empty string?

The natural consequence is that if the string does not contain the delimiter, a singleton array containing just the input string is returned, Second, remove all the rightmost empty strings. This is the reason ",,,". split(",") returns empty array.

Does split return empty string?

The split() method does not change the value of the original string. If the delimiter is an empty string, the split() method will return an array of elements, one element for each character of string. If you specify an empty string for string, the split() method will return an empty string and not an array of strings.

What is the purpose of empty string?

The empty string is a legitimate string, upon which most string operations should work. Some languages treat some or all of the following in similar ways: empty strings, null references, the integer 0, the floating point number 0, the Boolean value false, the ASCII character NUL, or other such values.

Why does split return a list?

In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.


3 Answers

If you use re.split, then a delimiter at the beginning or end of the string causes an empty string at the beginning or end of the array in the result.

If you don't want this, use re.findall with a regex that matches every sequence NOT containing delimiters.

Example:

import re

a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))

Output:

['', '1', '2', '3', '4', '']
['1', '2', '3', '4']

As others have pointed out in their answers, this may not be the perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.

like image 91
Florian Winter Avatar answered Oct 19 '22 17:10

Florian Winter


As a more pythonic way you can just use a list comprehension and str.isdigit() method to check of your character is digit :

>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']

And about your code first of all you need to split based on space or brackets that could be done with [\[\] ] and for get rid of empty strings that is for leading and trailing brackets you can first strip your string :

>>> y =  "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y =  "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']

You can also wrap your result with filter function and using bool function.

>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']
like image 20
Mazdak Avatar answered Oct 19 '22 17:10

Mazdak


You can just use filter to avoid empty results:

x = "[1 2 3 4][2 3 4 5]"

print filter(None, re.split(r'[^\d.]+', x))
# => ['1', '2', '3', '4', '2', '3', '4', '5']
like image 23
anubhava Avatar answered Oct 19 '22 16:10

anubhava