I have a dictionary of keys and values. I want to "map" the numbers in a dataframe column, where the original column is the keys and the new column is the values.
However, any values that are not included in the dictionary should be coded as 999.
Original dataframe:
Col1
0 02
1 03
2 02
3 02
4 04
5 88
6 77
Dictionary:
codes = {'01':'05',
'02':'06',
'03':'07',
'04':'08'}
Expected output:
>>> df['ColNew'] = df['Col1'].map(codes)
ColNew
0 06
1 07
2 06
3 06
4 08
5 999
6 999
I'm not sure how to do this, other than to include the 999 codes in the dictionary in the first place. That's frustrating when there are over a hundred codes involved and only a few of them need to be anything other than 999.
You can use the following syntax to exclude columns in a pandas DataFrame: #exclude column1 df. loc[:, df. columns!='
We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Parameters: dataframe: is the input dataframe.
To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].
To select a single column, use square brackets [] with the column name of the column of interest.
use map
and dict.get
dict.get
allows you to pass a default value in the event the key
doesn't exist.
df['ColNew'] = df['Col1'].map(lambda x: codes.get(x, 999))
df
Col1 ColNew
0 02 06
1 03 07
2 02 06
3 02 06
4 04 08
5 88 999
6 77 999
This will also preserve the dtypes
. In this case it doesn't matter because the dtype
of the column is object
.
However, if it were int
, map
would turn it into float
when NaN
came back. By having a default value, we avoid the type conversion.
Note: This is an inferior answer to that of piRSquared due to the type conversion:
You can simply fill the NaNs afterwards.
df['ColNew'] = df.Col1.map(codes).fillna('999')
Result:
ColNew
0 06
1 07
2 06
3 06
4 05
5 999
6 999
One interesting thing is that the na_action
parameter to Series.map
is not used as a default mapping argument, as I was originally tempted to think.
Its purpose is actually to control whether NaN values are affected by the mapping function - if you didn't map them in any way, you would see a potential performance increase by setting na_action='ignore'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With