I have the following type of dataframe: <pre class="prettyprint"><code> Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicassen 0 2 3 12669 9656 7561 214 2674 1338 1 2 3 7057 9810 9568 1762 3293 1776 2 2 3 6353 8808 7684 2405 3516 7844 3 1 3 13265 1196 4221 6404 507 1788 4 2 3 22615 5410 7198 3915 1777 5185 </code></pre> I would like to do two things: 1) Be able to rescale only certain columns and not all of them in order for them to be between 0,1. I would like to select only certain columns but not by their name but by their position. Imagine I want to change 200 and don't want to write all of them. The code I tried was: <pre class="prettyprint"><code>df /= df.max() </code></pre> But it makes all of the columns to be between (0,1) and not only the ones I want. And I can't find a way to select a part of them only. 2) I would also like to re scale the columns but not between them, what I mean is I would like to make a scale only for milk and another one only for frozen, for instance. I want to re scale each one, for example divide between 100 because they are too big, but maybe for another column I would like to divide it between 10 cause 100 is too much. How would I do that?

For 1, you can select a list of columns like this: <pre class="prettyprint"><code>df[['Milk','Frozen','Grocery']] </code></pre> Therefore, to rescale only those three columns, use: <pre class="prettyprint"><code>df[['Milk','Frozen','Grocery']] -= df[['Milk','Frozen','Grocery']].min() df[['Milk','Frozen','Grocery']] /= df[['Milk','Frozen','Grocery']].max() </code></pre> This method already scales your column independantly from each other if this is what your second question means. EDIT: If you want to select the 200 first columns of your dataframe, you can use <code>df.columns</code> which gives you the list of your columns: <pre class="prettyprint"><code>df[df.columns[:200]] -= df[df.columns[:200]].min() df[df.columns[:200]] /= df[df.columns[:200]].max() </code></pre> the <code>max</code> method on pandas on a dataframe returns a list of the max of each column. Therefore if you use the above code, you'll have max values in each of the columns exactly equal to 1. If you don't want to divide it by the max of each column but first column by <code>n1</code>, second column by <code>n2</code> you can use the same notation: <pre class="prettyprint"><code>df[df.columns[:4]] /= [n1,n2,n3,n4] </code></pre>

Rescaling to (0,1) certain columns from Pandas Python dataframe

Tags:

python

pandas

dataframe

scaling

I have the following type of dataframe:

  Channel   Region  Fresh   Milk    Grocery Frozen  Detergents_Paper    Delicassen
0   2         3     12669   9656    7561    214        2674             1338
1   2         3     7057    9810    9568    1762       3293             1776
2   2         3     6353    8808    7684    2405       3516             7844
3   1         3     13265   1196    4221    6404       507              1788
4   2         3     22615   5410    7198    3915       1777             5185

I would like to do two things:

1) Be able to rescale only certain columns and not all of them in order for them to be between 0,1. I would like to select only certain columns but not by their name but by their position. Imagine I want to change 200 and don't want to write all of them.

The code I tried was:

df /= df.max()

But it makes all of the columns to be between (0,1) and not only the ones I want. And I can't find a way to select a part of them only.

2) I would also like to re scale the columns but not between them, what I mean is I would like to make a scale only for milk and another one only for frozen, for instance.

I want to re scale each one, for example divide between 100 because they are too big, but maybe for another column I would like to divide it between 10 cause 100 is too much. How would I do that?

615

asked Jun 29 '16 15:06

Antonio López Ruiz

2 Answers

For 1, you can select a list of columns like this:

df[['Milk','Frozen','Grocery']]

Therefore, to rescale only those three columns, use:

df[['Milk','Frozen','Grocery']] -= df[['Milk','Frozen','Grocery']].min()
df[['Milk','Frozen','Grocery']] /= df[['Milk','Frozen','Grocery']].max()

This method already scales your column independantly from each other if this is what your second question means.

EDIT:

If you want to select the 200 first columns of your dataframe, you can use df.columns which gives you the list of your columns:

df[df.columns[:200]] -= df[df.columns[:200]].min()
df[df.columns[:200]] /= df[df.columns[:200]].max()

the max method on pandas on a dataframe returns a list of the max of each column. Therefore if you use the above code, you'll have max values in each of the columns exactly equal to 1.

If you don't want to divide it by the max of each column but first column by n1, second column by n2 you can use the same notation:

df[df.columns[:4]] /= [n1,n2,n3,n4]

answered Nov 15 '22 03:11

ysearka

Here's a solution for a single column which does actually rescale over 0,1:

a = [5,15,25,35,45,50,55,65,75,85,95]
df = pd.DataFrame(data=a, columns=['a'])
df['rescale'] = (df['a'] - min(df['a'])) / (max(df['a']) - min(df['a']))

Also a numpy method:

import numpy as np
rescale = (a - np.min(a))/np.ptp(a)

answered Nov 15 '22 02:11

smartse

Related questions
                            
                                Is there a "wildcard method" in Python?
                            
                                How to place objects in the middle of a frame using tkinter?
                            
                                python and sqlite3, check if I can use fts5 extension?
                            
                                Set the endpoint for boto3 SQS
                            
                                Define a feed_dict in c++ for Tensorflow models
                            
                                How can this be called Pass By Reference?
                            
                                How to make a Windows 10 computer go to sleep with a python script?
                            
                                Remove last two characters from column names of all the columns in Dataframe - Pandas
                            
                                Change URL to another URL using mitmproxy
                            
                                matplotlib how to specify time locator's start-ticking timestamp?
                            
                                Serving .json file to download
                            
                                SQLAlchemy func.count on boolean column
                            
                                Pretty Display JSON data from with Flask [duplicate]
                            
                                Google Sheets API "update" method Http Error 400
                            
                                MongoEngine delete document
                            
                                How to round float down to a given precision?
                            
                                python selenium send_keys CONTROL, 'c' not copying actual text
                            
                                Scheduling an asyncio coroutine from another thread
                            
                                How to assign sounds to channels in Pygame?
                            
                                Python Searching Nested Lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With