We have a bunch of strings for example: c1309
, IF1306
, v1309
, p1209
, a1309
, mo1309
.
In Python, what is the best way to strip out the numbers? All I need is: c
, IF
, v
, p
, a
, mo
from above example.
In Python, an inbuilt function sub() is present in the regex module to delete numbers from the Python string. The sub() method replaces all the existences of the given order in the string using a replacement string.
You can use regex
:
>>> import re
>>> strs = "c1309, IF1306, v1309, p1209, a1309, mo1309"
>>> re.sub(r'\d','',strs)
'c, IF, v, p, a, mo'
or a faster version:
>>> re.sub(r'\d+','',strs)
'c, IF, v, p, a, mo'
timeit
comparisons:
>>> strs = "c1309, IF1306, v1309, p1209, a1309, mo1309"*10**5
>>> %timeit re.sub(r'\d','',strs)
1 loops, best of 3: 1.23 s per loop
>>> %timeit re.sub(r'\d+','',strs)
1 loops, best of 3: 480 ms per loop
>>> %timeit ''.join([c for c in strs if not c.isdigit()])
1 loops, best of 3: 1.07 s per loop
#winner
>>> %timeit from string import digits;strs.translate(None, digits)
10 loops, best of 3: 20.4 ms per loop
>>> text = 'mo1309'
>>> ''.join([c for c in text if not c.isdigit()])
'mo'
This is faster than regex
python -m timeit -s "import re; text = 'mo1309'" "re.sub(r'\d','',text)"
100000 loops, best of 3: 3.99 usec per loop
python -m timeit -s "import re; text = 'mo1309'" "''.join([c for c in text if not c.isdigit()])"
1000000 loops, best of 3: 1.42 usec per loop
python -m timeit -s "from string import digits; text = 'mo1309'" "text.translate(None, digits)"
1000000 loops, best of 3: 0.42 usec per loop
but str.translate
as suggested by @DavidSousa:
from string import digits
text.translate(None, digits)
is always the fastest in stripping characters.
Also itertools
supplies a little known function called ifilterfalse
>>> from itertools import ifilterfalse
>>> ''.join(ifilterfalse(str.isdigit, text))
'mo'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With