Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: map to new column, excluding some codes

Tags:

python

pandas

I have a dictionary of keys and values. I want to "map" the numbers in a dataframe column, where the original column is the keys and the new column is the values.

However, any values that are not included in the dictionary should be coded as 999.

Original dataframe:

     Col1
0    02
1    03
2    02
3    02
4    04
5    88
6    77

Dictionary:

codes = {'01':'05',
         '02':'06',
         '03':'07',
         '04':'08'}

Expected output:

>>> df['ColNew'] = df['Col1'].map(codes)

     ColNew
0    06
1    07
2    06
3    06
4    08
5    999
6    999

I'm not sure how to do this, other than to include the 999 codes in the dictionary in the first place. That's frustrating when there are over a hundred codes involved and only a few of them need to be anything other than 999.

like image 810
ale19 Avatar asked Apr 13 '17 14:04

ale19


People also ask

How do you exclude a specific column in Python?

You can use the following syntax to exclude columns in a pandas DataFrame: #exclude column1 df. loc[:, df. columns!='

How do I exclude one column from a DataFrame?

We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Parameters: dataframe: is the input dataframe.

How do you select all columns except some in pandas?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].

How do I get only certain columns from a data frame?

To select a single column, use square brackets [] with the column name of the column of interest.


2 Answers

use map and dict.get
dict.get allows you to pass a default value in the event the key doesn't exist.

df['ColNew'] = df['Col1'].map(lambda x: codes.get(x, 999))

df

  Col1 ColNew
0   02     06
1   03     07
2   02     06
3   02     06
4   04     08
5   88    999
6   77    999

This will also preserve the dtypes. In this case it doesn't matter because the dtype of the column is object.

However, if it were int, map would turn it into float when NaN came back. By having a default value, we avoid the type conversion.

like image 116
piRSquared Avatar answered Oct 02 '22 03:10

piRSquared


Note: This is an inferior answer to that of piRSquared due to the type conversion:

You can simply fill the NaNs afterwards.

df['ColNew'] = df.Col1.map(codes).fillna('999')

Result:

     ColNew
0    06
1    07
2    06
3    06
4    05
5    999
6    999

One interesting thing is that the na_action parameter to Series.map is not used as a default mapping argument, as I was originally tempted to think.

Its purpose is actually to control whether NaN values are affected by the mapping function - if you didn't map them in any way, you would see a potential performance increase by setting na_action='ignore'.

like image 41
miradulo Avatar answered Oct 02 '22 03:10

miradulo