Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas applying regex to replace values

I have read some pricing data into a pandas dataframe the values appear as:

$40,000* $40000 conditions attached 

I want to strip it down to just the numeric values. I know I can loop through and apply regex

[0-9]+ 

to each field then join the resulting list back together but is there a not loopy way?

Thanks

like image 691
KillerSnail Avatar asked Mar 23 '14 07:03

KillerSnail


People also ask

Can regex be used with replace in Python?

Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.

What is regex in pandas replace?

replace() Pandas replace() is a very rich function that is used to replace a string, regex, dictionary, list, and series from the DataFrame. The values of the DataFrame can be replaced with other values dynamically. It is capable of working with the Python regex(regular expression). It differs from updating with .


Video Answer


1 Answers

You could use Series.str.replace:

import pandas as pd  df = pd.DataFrame(['$40,000*','$40000 conditions attached'], columns=['P']) print(df) #                             P # 0                    $40,000* # 1  $40000 conditions attached  df['P'] = df['P'].str.replace(r'\D+', '', regex=True).astype('int') print(df) 

yields

       P 0  40000 1  40000 

since \D matches any character that is not a decimal digit.

like image 56
unutbu Avatar answered Oct 26 '22 20:10

unutbu