Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas equivalent for replace

Tags:

In R, there is a rather useful replace function. Essentially, it does conditional re-assignment in a given column of a data frame. It can be used as so: replace(df$column, df$column==1,'Type 1');

What is a good way to achieve the same in pandas?

Should I use a lambda with apply? (If so, how do I get a reference to the given column, as opposed to a whole row).

Should I use np.where on data_frame.values? It seems like I am missing a very obvious thing here.

Any suggestions are appreciated.

like image 372
ivan-k Avatar asked Aug 28 '12 04:08

ivan-k


People also ask

What is replace in pandas?

Pandas DataFrame replace() Method The replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.

How do you replace a value in Python?

Python String | replace() replace() is an inbuilt function in the Python programming language that returns a copy of the string where all occurrences of a substring are replaced with another substring. Parameters : old – old substring you want to replace. new – new substring which would replace the old substring.

What is regex in pandas replace?

replace() Pandas replace() is a very rich function that is used to replace a string, regex, dictionary, list, and series from the DataFrame. The values of the DataFrame can be replaced with other values dynamically. It is capable of working with the Python regex(regular expression). It differs from updating with .


2 Answers

pandas has a replace method too:

In [25]: df = DataFrame({1: [2,3,4], 2: [3,4,5]})  In [26]: df Out[26]:     1  2 0  2  3 1  3  4 2  4  5  In [27]: df[2] Out[27]:  0    3 1    4 2    5 Name: 2  In [28]: df[2].replace(4, 17) Out[28]:  0     3 1    17 2     5 Name: 2  In [29]: df[2].replace(4, 17, inplace=True) Out[29]:  0     3 1    17 2     5 Name: 2  In [30]: df Out[30]:     1   2 0  2   3 1  3  17 2  4   5 

or you could use numpy-style advanced indexing:

In [47]: df[1] Out[47]:  0    2 1    3 2    4 Name: 1  In [48]: df[1] == 4 Out[48]:  0    False 1    False 2     True Name: 1  In [49]: df[1][df[1] == 4] Out[49]:  2    4 Name: 1  In [50]: df[1][df[1] == 4] = 19  In [51]: df Out[51]:      1   2 0   2   3 1   3  17 2  19   5 
like image 102
DSM Avatar answered Nov 14 '22 02:11

DSM


Pandas doc for replace does not have any examples, so I will give some here. For those coming from an R perspective (like me), replace is basically an all-purpose replacement function that combines the functionality of R functions plyr::mapvalues, plyr::revalue and stringr::str_replace_all. Since DSM covered the case of single values, I will cover the multi-value case.

Example series

In [10]: x = pd.Series([1, 2, 3, 4])  In [11]: x Out[11]:  0    1 1    2 2    3 3    4 dtype: int64 

We want to replace the positive integers with negative integers (and not by multiplying with -1).

Two lists of values

One way to do this by having one list (or pandas series) of the values we want to replace and a second list with the values we want to replace them with.

In [14]: x.replace([1, 2, 3, 4], [-1, -2, -3, -4]) Out[14]:  0   -1 1   -2 2   -3 3   -4 dtype: int64 

This corresponds to plyr::mapvalues.

Dictionary of value pairs

Sometimes it's more convenient to have a dictionary of value pairs. The index is the one we replace and the value is the one we replace it with.

In [15]: x.replace({1: -1, 2: -2, 3: -3, 4: -4}) Out[15]:  0   -1 1   -2 2   -3 3   -4 dtype: int64 

This corresponds to plyr::revalue.

Strings

It works similarly for strings, except that we also have the option of using regex patterns.

If we simply want to replace strings with other strings, it works exactly the same as before:

In [18]: s = pd.Series(["ape", "monkey", "seagull"]) In [22]: s Out[22]:  0        ape 1     monkey 2    seagull dtype: object 

Two lists

In [25]: s.replace(["ape", "monkey"], ["lion", "panda"]) Out[25]:  0       lion 1      panda 2    seagull dtype: object 

Dictionary

In [26]: s.replace({"ape": "lion", "monkey": "panda"}) Out[26]:  0       lion 1      panda 2    seagull dtype: object 

Regex

Replace all as with xs.

In [27]: s.replace("a", "x", regex=True) Out[27]:  0        xpe 1     monkey 2    sexgull dtype: object 

Replace all ls with xs.

In [28]: s.replace("l", "x", regex=True) Out[28]:  0        ape 1     monkey 2    seaguxx dtype: object 

Note that both ls in seagull were replaced.

Replace as with xs and ls with ps

In [29]: s.replace(["a", "l"], ["x", "p"], regex=True) Out[29]:  0        xpe 1     monkey 2    sexgupp dtype: object 

In the special case where one wants to replace multiple different values with the same value, one can just simply a single string as the replacement. It must not be inside a list. Replace as and ls with ps

In [29]: s.replace(["a", "l"], "p", regex=True) Out[29]:  0        ppe 1     monkey 2    sepgupp dtype: object 

(Credit to DaveL17 in the comments)

like image 44
CoderGuy123 Avatar answered Nov 14 '22 02:11

CoderGuy123