Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there functions to retrieve the histogram counts of a Series in pandas?

Tags:

There is a method to plot Series histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it?

I keep using numpy's functions to do this and converting the result to a DataFrame or Series when I need this. It would be nice to stay with pandas objects the whole time.

like image 491
Rafael S. Calsaverini Avatar asked Jun 17 '13 13:06

Rafael S. Calsaverini


People also ask

How do you count the number of values in pandas series?

Pandas Series: value_counts() function The value_counts() function is used to get a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

How do you show histogram in pandas?

In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.

What are bins in histogram pandas?

Bins are the buckets that your histogram will be grouped by. On the back end, Pandas will group your data into bins, or buckets. Then pandas will count how many values fell into that bucket, and plot the result.


2 Answers

If your Series was discrete you could use value_counts:

In [11]: s = pd.Series([1, 1, 2, 1, 2, 2, 3])  In [12]: s.value_counts() Out[12]: 2    3 1    3 3    1 dtype: int64 

You can see that s.hist() is essentially equivalent to s.value_counts().plot().

If it was of floats an awful hacky solution could be to use groupby:

s.groupby(lambda i: np.floor(2*s[i]) / 2).count() 
like image 141
Andy Hayden Avatar answered Oct 09 '22 22:10

Andy Hayden


Since hist and value_counts don't use the Series' index, you may as well treat the Series like an ordinary array and use np.histogram directly. Then build a Series from the result.

In [4]: s = Series(randn(100))  In [5]: counts, bins = np.histogram(s)  In [6]: Series(counts, index=bins[:-1]) Out[6]:  -2.968575     1 -2.355032     4 -1.741488     5 -1.127944    26 -0.514401    23  0.099143    23  0.712686    12  1.326230     5  1.939773     0  2.553317     1 dtype: int32 

This is a really convenient way to organize the result of a histogram for subsequent computation.

To index by the center of each bin instead of the left edge, you could use bins[:-1] + np.diff(bins)/2.

like image 33
Dan Allan Avatar answered Oct 09 '22 23:10

Dan Allan