Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not able to replace the string containing $ in pandas column

I have a dataframe

df = pd.DataFrame({'a':[1,2,3], 'b':[5, '12$sell', '1$sell']})

I want to replace $sell from column b.

So I tried replace() method like below

df['b'] = df['b'].str.replace("$sell","")

but it's doesn't replace the given string and it gives me same dataframe as original.

It's working when I use it with apply

df['b'] = df['b'].apply(lambda x: str(x).replace("$sell",""))

So I want to know why it is not working in previous case?

Note: I tried replacing only $ and shockingly it works.

like image 306
Sociopath Avatar asked Sep 28 '18 12:09

Sociopath


People also ask

Why string replace is not working in Python?

You are facing this issue because you are using the replace method incorrectly. When you call the replace method on a string in python you get a new string with the contents replaced as specified in the method call. You are not storing the modified string but are just using the unmodified string.

How do I replace a string in a DataFrame column?

You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.

How do I change the contents of a column in Pandas?

In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values. Below example replace Spark with PySpark value on the Course column. Notice that all the Spark values are replaced with the Pyspark values under the first column.

How do I replace missing values in a column in Pandas?

The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).


1 Answers

It is regex metacharacter (end of string), escape it or add parameter regex=False:

df['b'] = df['b'].str.replace("\$sell","")
print (df)
   a    b
0  1  NaN
1  2   12
2  3    1

df['b'] = df['b'].str.replace("$sell","", regex=False)

If want also value 5, what is numeric, use Series.replace with regex=True for replace substrings - numeric values are not touched:

df['b'] = df['b'].replace("\$sell","", regex=True)

print (df['b'].apply(type))
0    <class 'int'>
1    <class 'str'>
2    <class 'str'>
Name: b, dtype: object

Or cast to strings all data of column:

df['b'] = df['b'].astype(str).str.replace("$sell","", regex=False)

print (df['b'].apply(type))
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
Name: b, dtype: object

And for better performance if no missing values is possible use list comprehension:

df['b'] = [str(x).replace("$sell","") for x in  df['b']]

print (df)
   a   b
0  1   5
1  2  12
2  3   1
like image 86
jezrael Avatar answered Sep 28 '22 01:09

jezrael