Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe: Extract numerical values (including decimals) from string

Tags:

python

pandas

I have a dataframe consisting of a column of strings. I want to extract the numerical numbers of these strings. However, some of the values are in metres, and some in kilometres. How do i detect that there is a "m" or "km" beside the number, standardize the units then extract the numbers to a new column?

details                 numbers
Distance                350m
Longest straight        860m
Top speed               305km
Full throttle           61 per cent

Desired output:

details                 numbers
Distance                350
Longest straight        860
Top speed               305000
Full throttle           61
like image 992
doyz Avatar asked Oct 15 '25 03:10

doyz


1 Answers

Use:

m = df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d+)', expand=False).astype(int)
df.loc[m, 'numbers'] *= 1000 

print (df)
            details  numbers
0          Distance      350
1  Longest straight      860
2         Top speed   305000
3     Full throttle       61

Explanation:

  1. Get mask for km values by contains
  2. Extract integer values and cast to int by extract
  3. Correct km values by multiple

EDIT: For extract floats values change regex in extract by this solution, also last cast to floats:

print (df)
            details      numbers
0          Distance        1.7km
1  Longest straight       860.8m
2         Top speed        305km
3     Full throttle  61 per cent

m =  df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
df.loc[m, 'numbers'] *= 1000 
print (df)
            details   numbers
0          Distance    1700.0
1  Longest straight     860.8
2         Top speed  305000.0
3     Full throttle      61.0
like image 107
jezrael Avatar answered Oct 16 '25 16:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!