Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract Letters and the first Digit only

I am working with a data frame that contain letters, special characters and digits. My goal is to extract all letters and the first digit. All digits always occur at the end after letters and special characters; however, some letters may appear after special characters. See the example below:

d = {'col1': ['A./B. 1234', 'CDEF/G5.','AB./C23']}
df = pd.DataFrame(data=d)
print(df)
#    col1
# 0  A./B. 1234
# 1  CDEF/G5.
# 2  AB./C23

I looked up many variants but I do not know how handle special characters ./ and the likes.

df.col1.str.extract('([A-Za-z\d]+)')
#    0
# 0  A
# 1  CDEF
# 2  AB

This gives me all the letters and digits until it reaches a special character. Eventually I would like to get the following output:

AB1
CDEFG5
ABC2

I am new to regex.

like image 327
Rob Avatar asked Dec 21 '25 18:12

Rob


2 Answers

You need to extract all the characters up to and including the first digit, and then replace any non-letter/digit characters with an empty string:

d = {'col1': ['A./B. 1234', 'CDEF/G5.','AB./C23']}
df = pd.DataFrame(data=d)
df.col1.str.extract(r'^([^\d]+\d)').replace('[^A-Za-z0-9]', '', regex=True)

Output:

        0
0     AB1
1  CDEFG5
2    ABC2
like image 186
Nick Avatar answered Dec 24 '25 07:12

Nick


Another method

s=df['col1'].str.extractall("([a-zA-Z0-9])")[0]
s[s.str.isalpha()|s.shift().str.isalpha()].sum(level=0)
0       AB1
1    CDEFG5
2      ABC2
Name: 0, dtype: object
like image 43
BENY Avatar answered Dec 24 '25 08:12

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!