I have list of integers and want to get frequency of each integer. This was discussed here The problem is that approach I'm using gives me frequency of floating numbers when my data set consist of integers only. Why that happens and how I can get frequency of integers from my data? I'm using pyplot.histogram to plot a histogram with frequency of occurrences <pre class="prettyprint"><code>import numpy as np import matplotlib.pyplot as plt from numpy import * data = loadtxt('data.txt',dtype=int,usecols=(4,)) #loading 5th column of csv file into array named data. plt.hist(data) #plotting the column as histogram </code></pre> I'm getting the histogram, but I've noticed that if I "print" hist(data) <pre class="prettyprint"><code>hist=np.histogram(data) print hist(data) </code></pre> I get this: <pre class="prettyprint"><code>(array([ 2323, 16338, 1587, 212, 26, 14, 3, 2, 2, 2]), array([ 1. , 2.8, 4.6, 6.4, 8.2, 10. , 11.8, 13.6, 15.4, 17.2, 19. ])) </code></pre> Where the second array represent values and first array represent number of occurrences. In my data set all values are integers, how that happens that second array have floating numbers and how should I get frequency of integers? UPDATE: This solves the problem, thank you Lev for the reply. <pre class="prettyprint"><code>plt.hist(data, bins=np.arange(data.min(), data.max()+1)) </code></pre> To avoid creating a new question how I can plot columns "in the middle" for each integer? Say, I want column for integer 3 take space between 2.5 and 3.5 not between 3 and 4. <img src="https://i.stack.imgur.com/tMskB.png" alt="histogram">

If you don't specify what bins to use, <code>np.histogram</code> and <code>pyplot.hist</code> will use a default setting, which is to use 10 equal bins. The left border of the 1st bin is the smallest value and the right border of the last bin is the largest. This is why the bin borders are floating point numbers. You can use the <code>bins</code> keyword arguments to enforce another choice of bins, e.g.: <pre class="prettyprint"><code>plt.hist(data, bins=np.arange(data.min(), data.max()+1)) </code></pre> Edit: the easiest way to shift all bins to the left is probably just to subtract 0.5 from all bin borders: <pre class="prettyprint"><code>plt.hist(data, bins=np.arange(data.min(), data.max()+1)-0.5) </code></pre> Another way to achieve the same effect (not equivalent if non-integers are present): <pre class="prettyprint"><code>plt.hist(data, bins=np.arange(data.min(), data.max()+1), align='left') </code></pre>

Python: Frequency of occurrences

Tags:

python

matplotlib

I have list of integers and want to get frequency of each integer. This was discussed here

The problem is that approach I'm using gives me frequency of floating numbers when my data set consist of integers only. Why that happens and how I can get frequency of integers from my data?

I'm using pyplot.histogram to plot a histogram with frequency of occurrences

import numpy as np
import matplotlib.pyplot as plt
from numpy import *
data = loadtxt('data.txt',dtype=int,usecols=(4,)) #loading 5th column of csv file into array named data. 
plt.hist(data) #plotting the column as histogram

I'm getting the histogram, but I've noticed that if I "print" hist(data)

hist=np.histogram(data)
print hist(data)

I get this:

(array([ 2323, 16338,  1587,   212,    26,    14,     3,     2,     2,     2]), 
array([  1. ,   2.8,   4.6,   6.4,   8.2,  10. ,  11.8,  13.6,  15.4,
    17.2,  19. ]))

Where the second array represent values and first array represent number of occurrences.

In my data set all values are integers, how that happens that second array have floating numbers and how should I get frequency of integers?

UPDATE:

This solves the problem, thank you Lev for the reply.

plt.hist(data, bins=np.arange(data.min(), data.max()+1))

To avoid creating a new question how I can plot columns "in the middle" for each integer? Say, I want column for integer 3 take space between 2.5 and 3.5 not between 3 and 4.

histogram

808

asked Mar 02 '14 12:03

user40

2 Answers

If you don't specify what bins to use, np.histogram and pyplot.hist will use a default setting, which is to use 10 equal bins. The left border of the 1st bin is the smallest value and the right border of the last bin is the largest.

This is why the bin borders are floating point numbers. You can use the bins keyword arguments to enforce another choice of bins, e.g.:

plt.hist(data, bins=np.arange(data.min(), data.max()+1))

Edit: the easiest way to shift all bins to the left is probably just to subtract 0.5 from all bin borders:

plt.hist(data, bins=np.arange(data.min(), data.max()+1)-0.5)

Another way to achieve the same effect (not equivalent if non-integers are present):

plt.hist(data, bins=np.arange(data.min(), data.max()+1), align='left')

134

answered Sep 19 '22 13:09

Lev Levitsky

(Late to the party, just thought I would add a seaborn implementation)

Seaborn Implementation of the above question:

seaborn.__version__ = 0.9.0 at time of writing.

Load the libraries and setup mock data.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

data = np.array([3]*10 + [5]*20 + [7]*5 + [9]*27 + [11]*2)

Plot the data using `seaborn.distplot`:

Using specified bins, calculated as per the above question.

sns.distplot(data,bins=np.arange(data.min(), data.max()+1),kde=False,hist_kws={"align" : "left"})
plt.show()

Trying `numpy` built-in binning methods

I used the doane binning method below, which produced integer bins, migth be worth trying out the standard binning methods from numpy.histogram_bin_edges as this is how matplotlib.hist() bins the data.

sns.distplot(data,bins="doane",kde=False,hist_kws={"align" : "left"})
plt.show()

Produces the below Histogram:

enter image description here

answered Sep 16 '22 13:09

RK1

Related questions
                            
                                Keras: How is Accuracy Calculated for Multi-Label Classification?
                            
                                How replace transparent with a color in pillow
                            
                                if-else vs "or" operation for None-check
                            
                                Equivalent of `package.json' and `package-lock.json` for `pip`
                            
                                Flask session doesn't update consistently with parallel requests
                            
                                Advice regarding IPython + MacVim Workflow
                            
                                Is it a good idea to have a syntax sugar to function composition in Python?
                            
                                How do you extend python with C++?
                            
                                Outlook PST File Parsing in Python [closed]
                            
                                How to insert arrays into a database?
                            
                                How to get PyCharm to auto-complete code in methods?
                            
                                What does it mean in linux scripts? #!/usr/bin/python -tt
                            
                                Python/SQLite3: cannot commit - no transaction is active
                            
                                How to log memory usage of an Django app per request
                            
                                matplotlib savefig image size with bbox_inches='tight'
                            
                                Numpy: 1D array with various shape
                            
                                Python: URLError: <urlopen error [Errno 10060]
                            
                                Histogram from data which is already binned, I have bins and frequency values
                            
                                Upload a file to a python flask server using curl
                            
                                Get minimum value field name using aggregation in django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Frequency of occurrences

Tags:

python

matplotlib

user40

People also ask

2 Answers

Lev Levitsky

Seaborn Implementation of the above question:

Plot the data using `seaborn.distplot`:

Trying `numpy` built-in binning methods

RK1

Recent Activity

Donate For Us

Python: Frequency of occurrences

Tags:

python

matplotlib

user40

People also ask

2 Answers

Lev Levitsky

Seaborn Implementation of the above question:

Plot the data using seaborn.distplot:

Trying numpy built-in binning methods

RK1

Related questions

Recent Activity

Donate For Us

Plot the data using `seaborn.distplot`:

Trying `numpy` built-in binning methods