I have an input string like this: a1b2c30d40
and I want to tokenize the string to: a, 1, b, 2, c, 30, d, 40
.
I know I can read each character one by one and keep track of the previous character to determine if I should tokenize it or not (2 digits in a row means don't tokenize it) but is there a more pythonic way of doing this?
The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
Split String in Python. To split a String in Python with a delimiter, use split() function. split() function splits the string into substrings and returns them as an array.
Python 3 - String split() Method The split() method returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.
>>> re.split(r'(\d+)', 'a1b2c30d40')
['a', '1', 'b', '2', 'c', '30', 'd', '40', '']
On the pattern: as the comment says, \d
means "match one digit", +
is a modifier that means "match one or more", so \d+
means "match as much digits as possible". This is put into a group ()
, so the entire pattern in context of re.split
means "split this string using as much digits as possible as the separator, additionally capturing matched separators into the result". If you'd omit the group, you'd get ['a', 'b', 'c', 'd', '']
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With