Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract only one capture group using regex for pandas dataframe?

Python beginner here. I am struggling to use regex for pandas. I have a rows like this that need to split up into a column containing only the number.

rando45m text78 here 123  $    1   0% text here  5 . 6&

I need it to be displayed as

     0    1    2   3 
0   123   1    0   5

I have used the following 2 methods

df2 = df.Keep.str.extractall('(\d+)((\s+)|(\%))')

df3 = df.Keep.str.extractall(r'(?<=\s)(\d+)(?=\s+|\%)')

df2 includes the whitespace in the cell. df3 errors out for an assertion error. Is there a way where I can only capture one group /1 for my dataframe?

Thanks

like image 791
Ppoc Avatar asked Jan 30 '26 01:01

Ppoc


1 Answers

Try this:

In [39]: df
Out[39]:
                                                      Keep
0  rando45m text78 here 123  $    1   0% text here  5 . 6&
1         aaa 101.5% here 123  $    1   0% text here  55 .

In [40]: df.Keep.str.extractall(r'\b(\d+(?:\.\d+)?)(?:\s|%|$)').unstack()
Out[40]:
           0
match      0    1  2  3     4
0        123    1  0  5  None
1      101.5  123  1  0    55
like image 78
MaxU - stop WAR against UA Avatar answered Jan 31 '26 15:01

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!