Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is pandas inserting spaces in my histogram?

Example data can be found here in CSV format.

Given the following code:

figure()
grp.vis.plot(kind='hist', alpha=.5, normed=True)
show()

I obtain the following figure:

enter image description here

Why is pandas inserting gaps in the figure? The values range from 0 to 7, and are all represented, so I see no reason why this should happen.

Thanks very much in advance!

like image 918
Louis Thibault Avatar asked May 11 '16 10:05

Louis Thibault


2 Answers

Because parameter bins with default value 10 is in hist:

grp.vis.plot(kind='hist', alpha=.5, bins=7, normed=True)

graph

If omit rwidth:

grp.vis.plot(kind='hist', alpha=.5, bins=7)

graph1

Docs:

bins : integer or array_like, optional

If an integer is given, bins + 1 bin edges are returned, consistently with numpy.histogram() for numpy version >= 1.3.

Unequally spaced bins are supported if bins is a sequence.

default is 10

rwidth : scalar or None, optional

The relative width of the bars as a fraction of the bin width. If None, automatically compute the width.

Ignored if histtype is ‘step’ or ‘stepfilled’.

Default is None

like image 189
jezrael Avatar answered Nov 11 '22 00:11

jezrael


Sorry for a bit of off-topic self-promotion, but perhaps you might find useful my library physt (see https://github.com/janpipek/physt ). Among other features, it provides different binning schemas, one of which ("integer") is suited for automatic "bins" for integer data.

import pandas as pd
import physt

df = pd.read_csv("visanal_so.csv")
ax = physt.h1(df.vis, "integer").plot(density=True, alpha=0.5)
ax.set_ylabel("Frequency");

What you get

P.S. Note that the plot is similar to the original but dissimilar to what @jezrael shows. The automatic pandas binning behaves somewhat strange and definitely not in the way you intended.

like image 20
honza_p Avatar answered Nov 10 '22 22:11

honza_p