I have a dataframe consisting of a column of strings. I want to extract the numerical numbers of these strings. However, some of the values are in metres, and some in kilometres. How do i detect that there is a "m" or "km" beside the number, standardize the units then extract the numbers to a new column?
details numbers
Distance 350m
Longest straight 860m
Top speed 305km
Full throttle 61 per cent
Desired output:
details numbers
Distance 350
Longest straight 860
Top speed 305000
Full throttle 61
Use:
m = df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d+)', expand=False).astype(int)
df.loc[m, 'numbers'] *= 1000
print (df)
details numbers
0 Distance 350
1 Longest straight 860
2 Top speed 305000
3 Full throttle 61
Explanation:
km
values by contains
int
by extract
km
values by multipleEDIT: For extract float
s values change regex in extract
by this solution, also last cast to float
s:
print (df)
details numbers
0 Distance 1.7km
1 Longest straight 860.8m
2 Top speed 305km
3 Full throttle 61 per cent
m = df['numbers'].str.contains('\d+km')
df['numbers'] = df['numbers'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
df.loc[m, 'numbers'] *= 1000
print (df)
details numbers
0 Distance 1700.0
1 Longest straight 860.8
2 Top speed 305000.0
3 Full throttle 61.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With