Removing a character from entire data frame

Question

A common operation that I need to do with pandas is to read the table from an Excel file and then remove semicolons from all the fields. The columns are often in mixed data types and I run into AtributeError when trying to do something like this:

for col in cols_to_check:
    df[col] = df[col].map(lambda x: x.replace(';',''))

AttributeError: 'float' object has no attribute 'replace'

when I wrap it in str() before replacing I have problems with Unicode characters, e.g.

for col in cols_to_check:
    df[col] = df[col].map(lambda x: str(x).replace(';',''))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

In excel this is a very simple operation, all it takes is to replace ; with an empty string. How can I do it similarly in pandas for entire dataframe, disregard of data types? Or am I missing something?

jezrael · Accepted Answer

You can use DataFrame.replace and for select use subset:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':['f;','d:','sda;sd'],
                   'D':['s','d;','d;p'],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B       C    D  E  F
0  1  4      f;    s  5  7
1  2  5      d:   d;  3  4
2  3  6  sda;sd  d;p  6  3

cols_to_check = ['C','D', 'E']

print (df[cols_to_check])
        C    D  E
0      f;    s  5
1      d:   d;  3
2  sda;sd  d;p  6

df[cols_to_check] = df[cols_to_check].replace({';':''}, regex=True)
print (df)
   A  B      C   D  E  F
0  1  4      f   s  5  7
1  2  5     d:   d  3  4
2  3  6  sdasd  dp  6  3

Removing a character from entire data frame

Tags:

python

string

replace

pandas

MJB

1 Answers

jezrael

Recent Activity

Donate For Us

Removing a character from entire data frame

Tags:

python

string

replace

pandas

MJB

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us