Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing a value in all cells of a DataFrame in Python

I have an example df:

df = pd.DataFrame({'A': ['100,100', '200,200'],
                   'B': ['200,100,100', '100']})

        A          B
0   100,100    200,100,100
1   200,200    100

and I want to replace the commas ',' with nothing (basically, remove them). You can probably guess a real-world application, as many data is written with thousand separators, feel free to introduce me to a better method.

Now I read the documentation for pd.replace() here and I tried several versions of code - it raises no error, but does not modify my data frame.

df = df.replace(',','')
df = df.replace({',': ''})
df = df.replace([','],'')
df = df.replace([','],[''])

I can get it working when specifying the column names and using the ".str.replace()" method for Series, but imagine having 20 columns. I also can get this working specifying columns in the df.replace() method but there must be a more convenient way for such an easy task. I could write a custom function, but pandas is such an amazing library it must be something I am missing.

This works:

df['A'] = df['A'].str.replace(',','')

Thank you!

like image 445
JachymDvorak Avatar asked Oct 21 '25 09:10

JachymDvorak


2 Answers

df.replace has a parameter regex set it to True for partial matches.

By default regex param is False. When False it replaces only exact or fullmatches.

From Pandas docs:

str: string exactly matching to_replace will be replaced with the value.

df.replace(',', '', regex=True)

        A          B
0  100100  200100100
1  200200        100

In pd.Series.str.replace by default it's regex param is True.

From docs:

Equivalent to str.replace() or re.sub(), depending on the regex value.

Determines if assumes the passed-in pattern is a regular expression:

  • If True, assumes the passed-in pattern is a regular expression.
  • If False, treats the pattern as a literal string
like image 165
Ch3steR Avatar answered Oct 23 '25 23:10

Ch3steR


Though your immediate question has probably been answered, I wanted to mention that if you are reading this data in from a csv file, you can pass the thousands argument with a comma "," to indicate that this should be treated as an integer and remove the comma:

import io
import pandas as pd

csv_file = io.StringIO("""
A,B,C
"1,000","2,000","3,000"
1,2,3
"50,000",50,5
""")

df = pd.read_csv(csv_file, thousands=",")

print(df)
       A     B     C
0   1000  2000  3000
1      1     2     3
2  50000    50     5

print(df.dtypes)
A    int64
B    int64
C    int64
dtype: object
like image 45
Cameron Riddell Avatar answered Oct 24 '25 00:10

Cameron Riddell