I have some values in a Python Pandas Series (type: <code>pandas.core.series.Series</code>) <pre class="prettyprint"><code>In [1]: series = pd.Series([0.0,950.0,-70.0,812.0,0.0,-90.0,0.0,0.0,-90.0,0.0,-64.0,208.0,0.0,-90.0,0.0,-80.0,0.0,0.0,-80.0,-48.0,840.0,-100.0,190.0,130.0,-100.0,-100.0,0.0,-50.0,0.0,-100.0,-100.0,0.0,-90.0,0.0,-90.0,-90.0,63.0,-90.0,0.0,0.0,-90.0,-80.0,0.0,]) In [2]: series.min() Out[2]: -100.0 In [3]: series.max() Out[3]: 950.0 </code></pre> I would like to get values of histogram (not necessary plotting histogram)... I just need to get the frequency for each interval. Let's say that my intervals are going from [-200; -150] to [950; 1000] so lower bounds are <pre class="prettyprint"><code>lwb = range(-200,1000,50) </code></pre> and upper bounds are <pre class="prettyprint"><code>upb = range(-150,1050,50) </code></pre> I don't know how to get frequency (the number of values that are inside each interval) now... I'm sure that defining lwb and upb is not necessary... but I don't know what function I should use to perform this! (after diving in Pandas doc, I think <code>cut</code> function can help me because it's a discretization problem... but I'm don't understand how to use it) After being able to do this, I will have a look at the way to display histogram (but that's an other problem)

Inorder to get the frequency counts of the values in a given interval binned range, we could make use of <code>pd.cut</code> which returns indices of half open bins for each element along with <code>value_counts</code> for computing their respective counts. To plot their counts, a bar plot can be then made. <pre class="prettyprint"><code>step = 50 bin_range = np.arange(-200, 1000+step, step) out, bins = pd.cut(s, bins=bin_range, include_lowest=True, right=False, retbins=True) out.value_counts().plot.bar() </code></pre> <img src="https://i.stack.imgur.com/sZBWd.png" alt="enter image description here"> Frequency for each interval sorted in descending order of their counts: <pre class="prettyprint"><code>out.value_counts().head() [-100, -50) 18 [0, 50) 16 [800, 850) 2 [-50, 0) 2 [950, 1000) 1 dtype: int64 </code></pre> <hr> To modify the plot to include just the lower closed interval of the range for aesthetic purpose, you could do: <pre class="prettyprint"><code>out.cat.categories = bins[:-1] out.value_counts().plot.bar() </code></pre> <img src="https://i.stack.imgur.com/Wu0XU.png" alt="enter image description here">

Histogram values of a Pandas Series

Tags:

python

pandas

matplotlib

numpy

I have some values in a Python Pandas Series (type: pandas.core.series.Series)

In [1]: series = pd.Series([0.0,950.0,-70.0,812.0,0.0,-90.0,0.0,0.0,-90.0,0.0,-64.0,208.0,0.0,-90.0,0.0,-80.0,0.0,0.0,-80.0,-48.0,840.0,-100.0,190.0,130.0,-100.0,-100.0,0.0,-50.0,0.0,-100.0,-100.0,0.0,-90.0,0.0,-90.0,-90.0,63.0,-90.0,0.0,0.0,-90.0,-80.0,0.0,])  In [2]: series.min() Out[2]: -100.0  In [3]: series.max() Out[3]: 950.0

I would like to get values of histogram (not necessary plotting histogram)... I just need to get the frequency for each interval.

Let's say that my intervals are going from [-200; -150] to [950; 1000]

so lower bounds are

lwb = range(-200,1000,50)

and upper bounds are

upb = range(-150,1050,50)

I don't know how to get frequency (the number of values that are inside each interval) now... I'm sure that defining lwb and upb is not necessary... but I don't know what function I should use to perform this! (after diving in Pandas doc, I think cut function can help me because it's a discretization problem... but I'm don't understand how to use it)

After being able to do this, I will have a look at the way to display histogram (but that's an other problem)

285

asked Oct 29 '12 21:10

Femto Trader

2 Answers

You just need to use the histogram function of NumPy:

import numpy as np count, division = np.histogram(series)

where division is the automatically calculated border for your bins and count is the population inside each bin.

If you need to fix a certain number of bins, you can use the argument bins and specify a number of bins, or give it directly the boundaries between each bin.

count, division = np.histogram(series, bins = [-201,-149,949,1001])

to plot the results you can use the matplotlib function hist, but if you are working in pandas each Series has its own handle to the hist function, and you can give it the chosen binning:

series.hist(bins=division)

Edit: As mentioned by another poster, Pandas is built on top of NumPy. Since OP is explicitly using Pandas, we can do away with the additional import by accessing NumPy through Pandas:

count, division = pd.np.histogram(series)

answered Oct 11 '22 07:10

EnricoGiampieri

Inorder to get the frequency counts of the values in a given interval binned range, we could make use of pd.cut which returns indices of half open bins for each element along with value_counts for computing their respective counts.

To plot their counts, a bar plot can be then made.

step = 50 bin_range = np.arange(-200, 1000+step, step) out, bins  = pd.cut(s, bins=bin_range, include_lowest=True, right=False, retbins=True) out.value_counts().plot.bar()

enter image description here

Frequency for each interval sorted in descending order of their counts:

out.value_counts().head() [-100, -50)    18 [0, 50)        16 [800, 850)      2 [-50, 0)        2 [950, 1000)     1 dtype: int64

To modify the plot to include just the lower closed interval of the range for aesthetic purpose, you could do:

out.cat.categories = bins[:-1] out.value_counts().plot.bar()

enter image description here

answered Oct 11 '22 08:10

Nickil Maveli

Related questions
                            
                                How to Maximize window in chrome using webDriver (python)
                            
                                How to Add Incremental Numbers to a New Column Using Pandas
                            
                                How to overplot a line on a scatter plot in python?
                            
                                Python: Find a substring in a string and returning the index of the substring
                            
                                Is there a better way to compare dictionary values
                            
                                Make new column in Panda dataframe by adding values from other columns
                            
                                How do you get the process ID of a program in Unix or Linux using Python?
                            
                                How to dynamically create a derived type in the Python C-API
                            
                                Is there a Generic python library to consume REST based services? [closed]
                            
                                How do I create a login API using Django Rest Framework?
                            
                                Why is Python 3 not backwards compatible? [closed]
                            
                                What is the difference between variable_scope and name_scope? [duplicate]
                            
                                built-in range or numpy.arange: which is more efficient?
                            
                                How to use a multiprocessing.Manager()?
                            
                                importing a module when the module name is in a variable [duplicate]
                            
                                py.test skips test class if constructor is defined
                            
                                django-rest-framework 3.0 create or update in nested serializer
                            
                                ":=" syntax and assignment expressions: what and why?
                            
                                Converting "yield from" statement to Python 2.7 code
                            
                                Turning off IntelliJ Auto-save

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With