Here is the python code i have used to split up letters and digits from a string of alphanumerics:
input_string = 'abcdefghijklmnopqrstuvwxyz1234567890'
import re
print re.search('[a-z]*', input_string).group()
print re.search('[0-9]*', input_string).group()
In output i am getting the string of letters but not getting the string of digits. If i modify the code like following the output is showing the digits:
print re.search('[0-9]*$', input_string).group()
I am used to grep
and i found it's functionalities are similar to those of re
module, if i run the following command in shell i get the desired result:
echo "abcdefghijklmnopqrstuvwxyz1234567890" | grep "[0-9]*"
Am i missing something here?
I suggest you to use re.findall
function (in-order to do a global match) instead of re.search
because re.search
would return only the first match.
>>> input_string = 'abcdefghijklmnopqrstuvwxyz1234567890'
>>> print re.findall(r'\d+|[a-z]+', input_string)
['abcdefghijklmnopqrstuvwxyz', '1234567890']
And also don't use [a-z]*
, it would return empty strings also. *
would repeat the previous token zero or more times where +
would repeat the previous token one or more times.
>>> print re.search(r'\d+', input_string).group()
1234567890
>>> print re.search(r'[a-z]+', input_string).group()
abcdefghijklmnopqrstuvwxyz
Why the first one works where the second fails?
>>> print re.search('[a-z]*', input_string).group()
abcdefghijklmnopqrstuvwxyz
>>> print re.search('[0-9]*', input_string).group()
>>>
*
repeats the previous token zero or more times ie, it would match an empty string which exists before each non-matching characters. First [a-z]*
returns abcdefghijklmnopqrstuvwxyz
because this substring was located at the start. If the input is like 8abcdefghijklmnopqrstuvwxyz
, it would return an empty string. This behaviour is because of re.search
function, where it stops after finding the first match. Here 8
is not matched by the above regex, so as i said, [a-z]*
regex would match the empty string which exists just before to the 8
.
regex = [0-9]*
, string = "abcdefghijklmnopqrstuvwxyz1234567890"
re.search
stops after finding the first match. Here a
is not matched by [0-9]
but [0-9]*
matches the empty string which exists before a
because *
would repeat the previous token zero or more times. That's why you got an empty string as output in the second case.
>>> print re.search('[0-9]*$', input_string).group()
1234567890
Since we added an end of the line anchor, it would search for zero or more digits at the line end. It would return an empty string as match if it finds no more digits at the last.
>>> print re.search('[0-9]*$', '12foo').group()
>>>
In output i am getting the string of letters but not getting the string of digits.
I just checked both ruby and perl, as well, and they produce the same results.
The digit pattern
matches:
However, re.search() only returns the first match.
The lower case letter pattern
matches:
if i run the following command in shell i get the desired result:
echo "abcdefghijklmnopqrstuvwxyz1234567890" | grep "[0-9]*"
In a bash shell, I get:
$ echo "abcdefghijklmnopqrstuvwxyz1234567890" | grep "[0-9]*"
abcdefghijk
And I get similar strange results with echo, grep, and other patterns.
Response to comment:
$ bash --version
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin10.0)
Copyright (C) 2007 Free Software Foundation, Inc.
$ echo "abc123" | grep -o "[a-z]*"
abc
$ echo "abc123" | grep -o "[0-9]*"
$ echo "abc123" | grep -o "[0-9]*$"
123
$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With