I have read some pricing data into a pandas dataframe the values appear as:
$40,000* $40000 conditions attached
I want to strip it down to just the numeric values. I know I can loop through and apply regex
[0-9]+
to each field then join the resulting list back together but is there a not loopy way?
Thanks
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
replace() Pandas replace() is a very rich function that is used to replace a string, regex, dictionary, list, and series from the DataFrame. The values of the DataFrame can be replaced with other values dynamically. It is capable of working with the Python regex(regular expression). It differs from updating with .
You could use Series.str.replace
:
import pandas as pd df = pd.DataFrame(['$40,000*','$40000 conditions attached'], columns=['P']) print(df) # P # 0 $40,000* # 1 $40000 conditions attached df['P'] = df['P'].str.replace(r'\D+', '', regex=True).astype('int') print(df)
yields
P 0 40000 1 40000
since \D
matches any character that is not a decimal digit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With