Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming dataframe string categories to numbers

Tags:

python

pandas

I have some columns in my dataframe that look like:

 total
  NaN
26-27
52-53
88-89
  165
  280
  399
  611
  962
 1407
 1937

I would like to transform them into numerical values using a round-up:

 total
  NaN
   27
   53
   89
  165
  280
  399
  611
  962
 1407
 1937

clearly, pd.to_numeric() does not work as 26-27 is an object. I can do it one by one, but is there an elegant and fast way to do the transformation?

like image 249
Mike Avatar asked Nov 17 '25 08:11

Mike


1 Answers

IIUC, we can use a little bit of regex to extract all numbers grabbing the last element before a line terminator

Before \n using $

\d+ matches a digit (equal to [0-9])

+ Quantifier — Matches between one and unlimited times, as many times as

df['total'].str.extract(r'(\d+)$').astype(float)
out:
0        NaN
1       27.0
2       53.0
3       89.0
4      165.0
5      280.0
6      399.0
7      611.0
8      962.0
9     1407.0
10    1937.0
Name: total, dtype: float64
like image 94
Umar.H Avatar answered Nov 18 '25 20:11

Umar.H



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!