I have a dictionary of keys and values. I want to "map" the numbers in a dataframe column, where the original column is the keys and the new column is the values. However, any values that are not included in the dictionary should be coded as 999. Original dataframe: <pre class="prettyprint"><code> Col1 0 02 1 03 2 02 3 02 4 04 5 88 6 77 </code></pre> Dictionary: <pre class="prettyprint"><code>codes = {'01':'05', '02':'06', '03':'07', '04':'08'} </code></pre> Expected output: <pre class="prettyprint"><code>>>> df['ColNew'] = df['Col1'].map(codes) ColNew 0 06 1 07 2 06 3 06 4 08 5 999 6 999 </code></pre> I'm not sure how to do this, other than to include the 999 codes in the dictionary in the first place. That's frustrating when there are over a hundred codes involved and only a few of them need to be anything other than 999.

use <code>map</code> and <code>dict.get</code> <code>dict.get</code> allows you to pass a default value in the event the <code>key</code> doesn't exist. <pre class="prettyprint"><code>df['ColNew'] = df['Col1'].map(lambda x: codes.get(x, 999)) df Col1 ColNew 0 02 06 1 03 07 2 02 06 3 02 06 4 04 08 5 88 999 6 77 999 </code></pre> This will also preserve the <code>dtypes</code>. In this case it doesn't matter because the <code>dtype</code> of the column is <code>object</code>. However, if it were <code>int</code>, <code>map</code> would turn it into <code>float</code> when <code>NaN</code> came back. By having a default value, we avoid the type conversion.

Note: This is an inferior answer to that of piRSquared due to the type conversion: You can simply fill the NaNs afterwards. <pre class="prettyprint"><code>df['ColNew'] = df.Col1.map(codes).fillna('999') </code></pre> Result: <pre class="prettyprint"><code> ColNew 0 06 1 07 2 06 3 06 4 05 5 999 6 999 </code></pre> <hr> One interesting thing is that the <code>na_action</code> parameter to <code>Series.map</code> is not used as a default mapping argument, as I was originally tempted to think. Its purpose is actually to control whether NaN values are affected by the mapping function - if you didn't map them in any way, you would see a potential performance increase by setting <code>na_action='ignore'</code>.

pandas: map to new column, excluding some codes

Tags:

python

pandas

I have a dictionary of keys and values. I want to "map" the numbers in a dataframe column, where the original column is the keys and the new column is the values.

However, any values that are not included in the dictionary should be coded as 999.

Original dataframe:

Dictionary:

codes = {'01':'05',
         '02':'06',
         '03':'07',
         '04':'08'}

Expected output:

>>> df['ColNew'] = df['Col1'].map(codes)

     ColNew
0    06
1    07
2    06
3    06
4    08
5    999
6    999

I'm not sure how to do this, other than to include the 999 codes in the dictionary in the first place. That's frustrating when there are over a hundred codes involved and only a few of them need to be anything other than 999.

810

asked Apr 13 '17 14:04

ale19

2 Answers

use map and dict.get
dict.get allows you to pass a default value in the event the key doesn't exist.

df['ColNew'] = df['Col1'].map(lambda x: codes.get(x, 999))

df

  Col1 ColNew
0   02     06
1   03     07
2   02     06
3   02     06
4   04     08
5   88    999
6   77    999

This will also preserve the dtypes. In this case it doesn't matter because the dtype of the column is object.

However, if it were int, map would turn it into float when NaN came back. By having a default value, we avoid the type conversion.

116

answered Oct 02 '22 03:10

piRSquared

Note: This is an inferior answer to that of piRSquared due to the type conversion:

You can simply fill the NaNs afterwards.

df['ColNew'] = df.Col1.map(codes).fillna('999')

Result:

One interesting thing is that the na_action parameter to Series.map is not used as a default mapping argument, as I was originally tempted to think.

Its purpose is actually to control whether NaN values are affected by the mapping function - if you didn't map them in any way, you would see a potential performance increase by setting na_action='ignore'.

answered Oct 02 '22 03:10

miradulo

Related questions
                            
                                imshow colormap figure and the suptitle don't align in the center
                            
                                gitlab-ci.yml python -c 'multiple line cmd' failed
                            
                                Pandas series mean and standard deviation
                            
                                How to loop over nextPageToken using GoogleDrive's Python Quickstart
                            
                                OpenCV canny edge detection is not working properly on ideal square
                            
                                How can I click a pushButton on my PyQt5 code and allow it to execute/run another .py file?
                            
                                How can i skip files that does not exist file in the repository using python?
                            
                                Python selenium print frame source
                            
                                How to flip a byte in python?
                            
                                Set order of columns in DynamoDB table of AWS
                            
                                Center x-axis labels in line plot
                            
                                Enabling SSL on Flask + Google App Engine
                            
                                In matplotlib 2.0, how do I revert colorbar behaviour to that of matplotlib 1.5?
                            
                                Understanding lstm input shape in keras with different sequence
                            
                                Fitting a Lognormal Distribution in Python using CURVE_FIT
                            
                                Python , variable store in memory
                            
                                Trying load a pandas dataframe into Flask session and use that throughout the session
                            
                                Python string in...in syntax
                            
                                Early stopping with tf.estimator, how?
                            
                                MkDocs and MathJax

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With