Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Find the maximum range in all the columns of dataframe

Tags:

python

pandas

I'm new very new to programming, so hopefully I'll ask my question clearly and perhaps you can guide me to the answer.

I have a dataframe "x", where the index represents the week of the year, and each column represents a numerical value of a city. I'm attempting to find the column that has the maximum range (ie: maximum value - minimum value). I can imagine this will need a loop to find the maximum and minimum of each column, store this as an object (or as a new row at the bottom perhaps?), and then find the max in that object (or row).

The dataframe looks like this:

        City1 City2 ... CityN 
week
1
2
3
4
...
53

Feedback on etiquette or wording is also appreciated.

like image 721
HolaGonzalo Avatar asked Jul 15 '14 02:07

HolaGonzalo


People also ask

How do you find the maximum value in an entire data frame?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do I find the range of a column in pandas?

In pandas, we can determine Period Range with Frequency with the help of period_range(). pandas. period_range() is one of the general functions in Pandas which is used to return a fixed frequency PeriodIndex, with day (calendar) as the default frequency.

How can we get the length of each entry of a column Column_name of a DataFrame?

DataFrame can be obtained by applying len() to the columns attribute.

How do you find the max value in a column in Python?

To find the maximum value of a column and to return its corresponding row values in Pandas, we can use df. loc[df[col]. idxmax()].


1 Answers

Something like (df.max() - df.min()).idxmax() should get you a maximum column:

>>> df = pd.DataFrame(np.random.random((5,4)), index=pd.Series(range(1,6), name="week"), columns=["City{}".format(i) for i in range(1,5)])
>>> df
         City1     City2     City3     City4
week                                        
1     0.908549  0.496167  0.220340  0.464060
2     0.429330  0.770133  0.824774  0.155694
3     0.893270  0.980108  0.574897  0.378443
4     0.982410  0.796103  0.080877  0.416432
5     0.444416  0.667695  0.459362  0.898792
>>> df.max() - df.min()
City1    0.553080
City2    0.483941
City3    0.743898
City4    0.743098
dtype: float64
>>> (df.max() - df.min()).idxmax()
'City3'
>>> df[(df.max() - df.min()).idxmax()]
week
1       0.220340
2       0.824774
3       0.574897
4       0.080877
5       0.459362
Name: City3, dtype: float64

If there might be more than one column at maximum range, you'll probably want something like

>>> col_ranges = df.max() - df.min()
>>> df.loc[:,col_ranges == col_ranges.max()]
         City3
week          
1     0.220340
2     0.824774
3     0.574897
4     0.080877
5     0.459362

instead.

like image 89
DSM Avatar answered Oct 21 '22 17:10

DSM