Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regular expression to separate numbers and characters in strings like "30M1000N20M"

Tags:

python

regex

I'm trying to separate the [0-9] and [A-Z] in strings like these:

100M
20M1D80M
20M1I79M
20M10000N80M

I tried using the Python re module, and the following is the code I used:

>>>import re
>>>num_alpha = re.compile('(([0-9]+)([A-Z]))+')
>>>str1="100M"
>>>n_a_match = num_alpha.match(str1)
>>>n_a_match.group(2), n_a_match.group(3)

100,M   #just what I want

>>>str1="20M10000N80M"
>>>n_a_match = num_alpha.match(str1)
>>>n_a_match.groups()

('80M', '80', 'M')  #only the last one, how can I get the first two?
#expected result ('20M','20','M','10000N','10000','N','80M','80','M')

This regular expression works well for strings which contain only one match, but not several groups of matches. How can I handle that using regular expressions?

like image 479
ct586 Avatar asked Feb 27 '13 01:02

ct586


People also ask

Can we use regex in split a string?

split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.

How is regular expression used in text extraction?

A regular expression can be used for searching for a string, searching within a string, or replacing one part of a string with another string.

How extract all numbers from string in regex?

Python Regex – Get List of all Numbers from String. To get the list of all numbers in a String, use the regular expression '[0-9]+' with re. findall() method. [0-9] represents a regular expression to match a single digit in the string.

Can you use regex with numbers?

Since regular expressions work with text, a regular expression engine treats 0 as a single character, and 255 as three characters. To match all characters from 0 to 255, we'll need a regex that matches between one and three characters. The regex [0-9] matches single-digit numbers 0 to 9.


1 Answers

I suggest using re.findall. If you intend to iterate over the results, rather than building a list, you could use re.finditer instead. Here's an example of how that would work:

>>> re.findall("(([0-9]+)([A-Z]))", "20M10000N80M")
[('20M', '20', 'M'), ('10000N', '10000', 'N'), ('80M', '80', 'M')]

If you don't want the combined numbers+letters string, you can remove the outer parentheses from the match and just get the separate parts:

>>> re.findall("([0-9]+)([A-Z])", "20M10000N80M")
[('20', 'M'), ('10000', 'N'), ('80', 'M')]

Or, if you don't want tuples at all (and you don't need to worry about malformed input, such as strings with several letters in a row), you could change the pattern to an alternation, and get the values one by one:

>>> re.findall("([0-9]+|[A-Z])", "20M10000N80M")
['20', 'M', '10000', 'N', '80', 'M']
like image 141
Blckknght Avatar answered Oct 27 '22 19:10

Blckknght