I am passing a dictionary to the map
function to recode values in the column of a Pandas dataframe. However, I noticed that if there is a value in the original series that is not explicitly in the dictionary, it gets recoded to NaN
. Here is a simple example:
Typing...
s = pd.Series(['one','two','three','four'])
...creates the series
0 one 1 two 2 three 3 four dtype: object
But applying the map...
recodes = {'one':'A', 'two':'B', 'three':'C'} s.map(recodes)
...returns the series
0 A 1 B 2 C 3 NaN dtype: object
I would prefer that if any element in series s
is not in the recodes
dictionary, it remains unchanged. That is, I would prefer to return the series below (with the original four
instead of NaN
).
0 A 1 B 2 C 3 four dtype: object
Is there an easy way to do this, for example an option to pass to the map
function? The challenge I am having is that I can't always anticipate all possible values that will be in the series I'm recoding - the data will be updated in the future and new values could appear.
Thanks!
map() does not execute the function for empty elements. map() does not change the original array.
For mean, use the mean() function. Calculate the mean for the column with NaN and use the fillna() to fill the NaN values with the mean.
Use replace
instead of map
:
>>> s = pd.Series(['one','two','three','four']) >>> recodes = {'one':'A', 'two':'B', 'three':'C'} >>> s.map(recodes) 0 A 1 B 2 C 3 NaN dtype: object >>> s.replace(recodes) 0 A 1 B 2 C 3 four dtype: object
If you still want to use map the map function (can be faster than replace in some cases), you can define missing values:
class MyDict(dict): def __missing__(self, key): return key s = pd.Series(['one', 'two', 'three', 'four']) recodes = MyDict({ 'one':'A', 'two':'B', 'three':'C' }) s.map(recodes) 0 A 1 B 2 C 3 four dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With