I have a list of product codes in a text file, on each line is the product code that looks like: <blockquote> abcd2343 abw34324 abc3243-23A </blockquote> So it is letters followed by numbers and other characters. I want to split on the first occurrence of a number.

<pre class="prettyprint"><code>import re s='abcd2343 abw34324 abc3243-23A' re.split('(\d+)',s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A'] </code></pre> Or, if you want to split on the first occurrence of a digit: <pre class="prettyprint"><code>re.findall('\d*\D+',s) > ['abcd', '2343 abw', '34324 abc', '3243-', '23A'] </code></pre> <hr> <ul> <li> <code>\d+</code> matches 1-or-more digits.</li> <li> <code>\d*\D+</code> matches 0-or-more digits followed by 1-or-more non-digits.</li> <li> <code>\d+|\D+</code> matches 1-or-more digits or 1-or-more non-digits.</li> </ul> Consult the docs for more about Python's regex syntax. <hr> <code>re.split(pat, s)</code> will split the string <code>s</code> using <code>pat</code> as the delimiter. If <code>pat</code> begins and ends with parentheses (so as to be a "capturing group"), then <code>re.split</code> will return the substrings matched by <code>pat</code> as well. For instance, compare: <pre class="prettyprint"><code>re.split('\d+', s) > ['abcd', ' abw', ' abc', '-', 'A'] # <-- just the non-matching parts re.split('(\d+)', s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A'] # <-- both the non-matching parts and the captured groups </code></pre> In contrast, <code>re.findall(pat, s)</code> returns only the parts of <code>s</code> that match <code>pat</code>: <pre class="prettyprint"><code>re.findall('\d+', s) > ['2343', '34324', '3243', '23'] </code></pre> Thus, if <code>s</code> ends with a digit, you could avoid ending with an empty string by using <code>re.findall('\d+|\D+', s)</code> instead of <code>re.split('(\d+)', s)</code>: <pre class="prettyprint"><code>s='abcd2343 abw34324 abc3243-23A 123' re.split('(\d+)', s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123', ''] re.findall('\d+|\D+', s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123'] </code></pre>

Product code looks like abcd2343, how to split by letters and numbers?

2 Answers

import re s='abcd2343 abw34324 abc3243-23A' re.split('(\d+)',s)  > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']

Or, if you want to split on the first occurrence of a digit:

re.findall('\d*\D+',s) > ['abcd', '2343 abw', '34324 abc', '3243-', '23A']

\d+ matches 1-or-more digits.
\d*\D+ matches 0-or-more digits followed by 1-or-more non-digits.
\d+|\D+ matches 1-or-more digits or 1-or-more non-digits.

Consult the docs for more about Python's regex syntax.

re.split(pat, s) will split the string s using pat as the delimiter. If pat begins and ends with parentheses (so as to be a "capturing group"), then re.split will return the substrings matched by pat as well. For instance, compare:

re.split('\d+', s) > ['abcd', ' abw', ' abc', '-', 'A']   # <-- just the non-matching parts  re.split('(\d+)', s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']  # <-- both the non-matching parts and the captured groups

In contrast, re.findall(pat, s) returns only the parts of s that match pat:

re.findall('\d+', s) > ['2343', '34324', '3243', '23']

Thus, if s ends with a digit, you could avoid ending with an empty string by using re.findall('\d+|\D+', s) instead of re.split('(\d+)', s):

s='abcd2343 abw34324 abc3243-23A 123'  re.split('(\d+)', s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123', '']  re.findall('\d+|\D+', s) > ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123']

answered Oct 12 '22 02:10

unutbu

This function handles float and negative numbers as well.

def separate_number_chars(s):     res = re.split('([-+]?\d+\.\d+)|([-+]?\d+)', s.strip())     res_f = [r.strip() for r in res if r is not None and r.strip() != '']     return res_f

For example:

utils.separate_number_chars('-12.1grams') > ['-12.1', 'grams']

answered Oct 12 '22 02:10

Babak Ravandi

Related questions
                            
                                How to get a row-by-row MySQL ResultSet in python
                            
                                Which database engine to choose for Django app? [closed]
                            
                                ValueError: zero length field name in format in Python2.6.6
                            
                                Renaming a file in PyCharm
                            
                                How can I serialize a numpy array while preserving matrix dimensions?
                            
                                Managing connection to redis from Python
                            
                                How to change default install location for pip
                            
                                Which Python API should be used with Mongo DB and Django
                            
                                How to mock python's datetime.now() in a class method for unit testing?
                            
                                ImportError: No module named 'Cython' [duplicate]
                            
                                Python - is there a "don't care" symbol for tuple assignments?
                            
                                zip(list1, list2) in Jinja2?
                            
                                Reusing code from different IPython notebooks
                            
                                Obtain eigen values and vectors from sklearn PCA
                            
                                Python 3: does Pool keep the original order of data passed to map?
                            
                                Unable to show legend in seaborn distplot
                            
                                UUID('...') is not JSON serializable
                            
                                How to change the legend edgecolor and facecolor in matplotlib
                            
                                How to use values stored in variables as case patterns?
                            
                                Idiomatic Python: 'times' loop [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Product code looks like abcd2343, how to split by letters and numbers?

Tags:

python

string

split

Blankman

People also ask

2 Answers

unutbu

Babak Ravandi

Recent Activity

Donate For Us