Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rescaling to (0,1) certain columns from Pandas Python dataframe

I have the following type of dataframe:

  Channel   Region  Fresh   Milk    Grocery Frozen  Detergents_Paper    Delicassen
0   2         3     12669   9656    7561    214        2674             1338
1   2         3     7057    9810    9568    1762       3293             1776
2   2         3     6353    8808    7684    2405       3516             7844
3   1         3     13265   1196    4221    6404       507              1788
4   2         3     22615   5410    7198    3915       1777             5185

I would like to do two things:

1) Be able to rescale only certain columns and not all of them in order for them to be between 0,1. I would like to select only certain columns but not by their name but by their position. Imagine I want to change 200 and don't want to write all of them.

The code I tried was:

df /= df.max() 

But it makes all of the columns to be between (0,1) and not only the ones I want. And I can't find a way to select a part of them only.

2) I would also like to re scale the columns but not between them, what I mean is I would like to make a scale only for milk and another one only for frozen, for instance.

I want to re scale each one, for example divide between 100 because they are too big, but maybe for another column I would like to divide it between 10 cause 100 is too much. How would I do that?

like image 615
Antonio López Ruiz Avatar asked Jun 29 '16 15:06

Antonio López Ruiz


People also ask

How to scale columns of pandas Dataframe?

This process is called Scaling. There are two most common techniques of how to scale columns of Pandas dataframe – Min-Max Normalization and Standardization. Both of them have been discussed in the content below.

How to randomly select columns from pandas Dataframe?

Depending on your needs, you may use either of the 4 techniques below in order to randomly select columns from Pandas DataFrame: (2) Randomly select a specified number of columns. For example, to select 3 random columns, set n=3: (3) Allow a random selection of the same column more than once (by setting replace=True):

What does the Max method on pandas on a Dataframe return?

the max method on pandas on a dataframe returns a list of the max of each column. Therefore if you use the above code, you'll have max values in each of the columns exactly equal to 1.

How do I rescale data in Python?

Python’s scikit-learn library has a tool just for this called the MinMaxScaler . You can use that to rescale your values as well, if you’d like. Sometimes data spans across many powers of 10. A great example is annual income.


2 Answers

For 1, you can select a list of columns like this:

df[['Milk','Frozen','Grocery']]

Therefore, to rescale only those three columns, use:

df[['Milk','Frozen','Grocery']] -= df[['Milk','Frozen','Grocery']].min()
df[['Milk','Frozen','Grocery']] /= df[['Milk','Frozen','Grocery']].max()

This method already scales your column independantly from each other if this is what your second question means.

EDIT:

If you want to select the 200 first columns of your dataframe, you can use df.columns which gives you the list of your columns:

df[df.columns[:200]] -= df[df.columns[:200]].min()
df[df.columns[:200]] /= df[df.columns[:200]].max()

the max method on pandas on a dataframe returns a list of the max of each column. Therefore if you use the above code, you'll have max values in each of the columns exactly equal to 1.

If you don't want to divide it by the max of each column but first column by n1, second column by n2 you can use the same notation:

df[df.columns[:4]] /= [n1,n2,n3,n4]
like image 59
ysearka Avatar answered Nov 15 '22 03:11

ysearka


Here's a solution for a single column which does actually rescale over 0,1:

a = [5,15,25,35,45,50,55,65,75,85,95]
df = pd.DataFrame(data=a, columns=['a'])
df['rescale'] = (df['a'] - min(df['a'])) / (max(df['a']) - min(df['a']))

Also a numpy method:

import numpy as np
rescale = (a - np.min(a))/np.ptp(a)
like image 33
smartse Avatar answered Nov 15 '22 02:11

smartse