Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Median of pandas dataframe column

I have a DataFrame df:

name   count    
aaaa   2000    
bbbb   1900    
cccc    900    
dddd    500    
eeee    100

I would like to look at the rows that are within a factor of 10 from the median of the count column.

I tried df['count'].median() and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.

Expected Output :

name count distance from median

aaaa  2000   *****

I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).

like image 327
Ssank Avatar asked Apr 21 '15 16:04

Ssank


People also ask

How do I get the median of a column in Pandas?

If you want to see the median, you can use df. describe(). The 50% value is the median.

How do you find the median of a column?

In a data set with an odd number of values, the median is the middle element. If there are an even number of values, the median is the average of the middle two. For example, in the group of values {1, 2, 3, 4, 7} the median is 3.

How do I find the mean of a column in Pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How do you find median in Python?

median() method calculates the median (middle value) of the given data set. This method also sorts the data in ascending order before calculating the median. Tip: The mathematical formula for Median is: Median = {(n + 1) / 2}th value, where n is the number of values in a set of data.


2 Answers

If you're looking for how to calculate the Median Absolute Deviation -

In [1]: df['dist'] = abs(df['count'] - df['count'].median())

In [2]: df
Out[2]:
   name  count  dist
0  aaaa   2000  1100
1  bbbb   1900  1000
2  cccc    900     0
3  dddd    500   400
4  eeee    100   800

In [3]: df['dist'].median()
Out[3]: 800.0
like image 140
ComputerFellow Avatar answered Sep 28 '22 08:09

ComputerFellow


If you want to see the median, you can use df.describe(). The 50% value is the median.

like image 25
Marjan Alavi Avatar answered Sep 28 '22 10:09

Marjan Alavi