Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting letters from numbers within a string

I'm processing strings like this: "125A12C15" I need to split them at boundaries between letters and numbers, e.g. this one should become ["125","A","12","C","15"].

Is there a more elegant way to do this in Python than going through it position by position and checking whether it's a letter or a number, and then concatenating accordingly? E.g. a built-in function or module for this kind of thing?

Thanks for any pointers!

like image 695
CodingCat Avatar asked Mar 22 '13 14:03

CodingCat


People also ask

How do I split a word in a string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.


1 Answers

Use itertools.groupby together with str.isalpha method:

Docstring:

groupby(iterable[, keyfunc]) -> create an iterator which returns (key, sub-iterator) grouped by each value of key(value).


Docstring:

S.isalpha() -> bool

Return True if all characters in S are alphabetic and there is at least one character in S, False otherwise.


In [1]: from itertools import groupby

In [2]: s = "125A12C15"

In [3]: [''.join(g) for _, g in groupby(s, str.isalpha)]
Out[3]: ['125', 'A', '12', 'C', '15']

Or possibly re.findall or re.split from the regular expressions module:

In [4]: import re

In [5]: re.findall('\d+|\D+', s)
Out[5]: ['125', 'A', '12', 'C', '15']

In [6]: re.split('(\d+)', s)  # note that you may have to filter out the empty
                              # strings at the start/end if using re.split
Out[6]: ['', '125', 'A', '12', 'C', '15', '']

In [7]: re.split('(\D+)', s)
Out[7]: ['125', 'A', '12', 'C', '15']

As for the performance, it seems that using a regex is probably faster:

In [8]: %timeit re.findall('\d+|\D+', s*1000)
100 loops, best of 3: 2.15 ms per loop

In [9]: %timeit [''.join(g) for _, g in groupby(s*1000, str.isalpha)]
100 loops, best of 3: 8.5 ms per loop

In [10]: %timeit re.split('(\d+)', s*1000)
1000 loops, best of 3: 1.43 ms per loop
like image 96
root Avatar answered Sep 21 '22 19:09

root