I have an example df:
df = pd.DataFrame({'A': ['100,100', '200,200'],
                   'B': ['200,100,100', '100']})
        A          B
0   100,100    200,100,100
1   200,200    100
and I want to replace the commas ',' with nothing (basically, remove them). You can probably guess a real-world application, as many data is written with thousand separators, feel free to introduce me to a better method.
Now I read the documentation for pd.replace() here and I tried several versions of code - it raises no error, but does not modify my data frame.
df = df.replace(',','')
df = df.replace({',': ''})
df = df.replace([','],'')
df = df.replace([','],[''])
I can get it working when specifying the column names and using the ".str.replace()" method for Series, but imagine having 20 columns. I also can get this working specifying columns in the df.replace() method but there must be a more convenient way for such an easy task. I could write a custom function, but pandas is such an amazing library it must be something I am missing.
This works:
df['A'] = df['A'].str.replace(',','')
Thank you!
df.replace has a parameter regex set it to True for partial matches.
By default regex param is False. When False it replaces only exact or fullmatches.
From Pandas docs:
str: string exactly matching to_replace will be replaced with the value.
df.replace(',', '', regex=True)
        A          B
0  100100  200100100
1  200200        100
In pd.Series.str.replace by default it's regex param is True.
From docs:
Equivalent to
str.replace()orre.sub(), depending on the regex value.
Determines if assumes the passed-in pattern is a regular expression:
Though your immediate question has probably been answered, I wanted to mention that if you are reading this data in from a csv file, you can pass the thousands argument with a comma "," to indicate that this should be treated as an integer and remove the comma:
import io
import pandas as pd
csv_file = io.StringIO("""
A,B,C
"1,000","2,000","3,000"
1,2,3
"50,000",50,5
""")
df = pd.read_csv(csv_file, thousands=",")
print(df)
       A     B     C
0   1000  2000  3000
1      1     2     3
2  50000    50     5
print(df.dtypes)
A    int64
B    int64
C    int64
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With