I have a DataFrame df
:
name count
aaaa 2000
bbbb 1900
cccc 900
dddd 500
eeee 100
I would like to look at the rows that are within a factor of 10 from the median of the count
column.
I tried df['count'].median()
and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.
Expected Output :
name count distance from median
aaaa 2000 *****
I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).
If you want to see the median, you can use df. describe(). The 50% value is the median.
In a data set with an odd number of values, the median is the middle element. If there are an even number of values, the median is the average of the middle two. For example, in the group of values {1, 2, 3, 4, 7} the median is 3.
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
median() method calculates the median (middle value) of the given data set. This method also sorts the data in ascending order before calculating the median. Tip: The mathematical formula for Median is: Median = {(n + 1) / 2}th value, where n is the number of values in a set of data.
If you're looking for how to calculate the Median Absolute Deviation -
In [1]: df['dist'] = abs(df['count'] - df['count'].median())
In [2]: df
Out[2]:
name count dist
0 aaaa 2000 1100
1 bbbb 1900 1000
2 cccc 900 0
3 dddd 500 400
4 eeee 100 800
In [3]: df['dist'].median()
Out[3]: 800.0
If you want to see the median, you can use df.describe(). The 50% value is the median.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With