I'm trying to take a string of ints and/or floats and create a list of floats. The string is going to have these brackets in them that need to be ignored. I'm using re.split
, but if my string begins and ends with a bracket, I get extra empty strings. Why is that?
Code:
import re
x = "[1 2 3 4][2 3 4 5]"
y = "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)
Output:
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']
The natural consequence is that if the string does not contain the delimiter, a singleton array containing just the input string is returned, Second, remove all the rightmost empty strings. This is the reason ",,,". split(",") returns empty array.
The split() method does not change the value of the original string. If the delimiter is an empty string, the split() method will return an array of elements, one element for each character of string. If you specify an empty string for string, the split() method will return an empty string and not an array of strings.
The empty string is a legitimate string, upon which most string operations should work. Some languages treat some or all of the following in similar ways: empty strings, null references, the integer 0, the floating point number 0, the Boolean value false, the ASCII character NUL, or other such values.
In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.
If you use re.split
, then a delimiter at the beginning or end of the string causes an empty string at the beginning or end of the array in the result.
If you don't want this, use re.findall
with a regex that matches every sequence NOT containing delimiters.
Example:
import re
a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))
Output:
['', '1', '2', '3', '4', '']
['1', '2', '3', '4']
As others have pointed out in their answers, this may not be the perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.
As a more pythonic way you can just use a list comprehension and str.isdigit()
method to check of your character is digit :
>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']
And about your code first of all you need to split based on space or brackets that could be done with [\[\] ]
and for get rid of empty strings that is for leading and trailing brackets you can first strip
your string :
>>> y = "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y = "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']
You can also wrap your result with filter
function and using bool
function.
>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']
You can just use filter
to avoid empty results:
x = "[1 2 3 4][2 3 4 5]"
print filter(None, re.split(r'[^\d.]+', x))
# => ['1', '2', '3', '4', '2', '3', '4', '5']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With