There is a method to plot Series histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it?
I keep using numpy's functions to do this and converting the result to a DataFrame or Series when I need this. It would be nice to stay with pandas objects the whole time.
Pandas Series: value_counts() function The value_counts() function is used to get a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.
In order to plot a histogram using pandas, chain the . hist() function to the dataframe. This will return the histogram for each numeric column in the pandas dataframe.
Bins are the buckets that your histogram will be grouped by. On the back end, Pandas will group your data into bins, or buckets. Then pandas will count how many values fell into that bucket, and plot the result.
If your Series was discrete you could use value_counts
:
In [11]: s = pd.Series([1, 1, 2, 1, 2, 2, 3]) In [12]: s.value_counts() Out[12]: 2 3 1 3 3 1 dtype: int64
You can see that s.hist()
is essentially equivalent to s.value_counts().plot()
.
If it was of floats an awful hacky solution could be to use groupby:
s.groupby(lambda i: np.floor(2*s[i]) / 2).count()
Since hist
and value_counts
don't use the Series' index, you may as well treat the Series like an ordinary array and use np.histogram
directly. Then build a Series from the result.
In [4]: s = Series(randn(100)) In [5]: counts, bins = np.histogram(s) In [6]: Series(counts, index=bins[:-1]) Out[6]: -2.968575 1 -2.355032 4 -1.741488 5 -1.127944 26 -0.514401 23 0.099143 23 0.712686 12 1.326230 5 1.939773 0 2.553317 1 dtype: int32
This is a really convenient way to organize the result of a histogram for subsequent computation.
To index by the center of each bin instead of the left edge, you could use bins[:-1] + np.diff(bins)/2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With