It is relatively easy to manually give a list of bins when plotting an histogram with matplotlib
, as shown for example here.
A simple example of this is the following:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(np.random.randn(10000), bins=np.arange(-4, 4, 0.1))
ax.hist(0.2 * np.random.randn(10000), bins=np.arange(-4, 4, 0.1))
plt.show()
This can also be equivalently done from a pandas.DataFrame
with:
pd.DataFrame({
'firstHistogram': np.random.randn(10000),
'secondHistogram': 0.2 * np.random.randn(10000)
}).plot(kind='hist', bins=np.arange(-4, 4, 0.1))
Going further, plotly
allows to directly interface to pandas
through the cufflinks
module, which allows to do things like the following:
pd.DataFrame({
'firstHistogram': np.random.randn(10000),
'secondHistogram': 0.2 * np.random.randn(10000)
}).iplot(kind='hist', bins=100)
But here is the catch: the iplot
method provided by cufflinks
does not seem to accept a list for bins
.
When a number is provided like in the above example, that number is used to bin independently both datasets, which results in unequal binning, with potentially misleading results (see the equal heights in the above plot).
While this effect can be somewhat mitigated using the histnorm='density'
option, one may want to see the counts per bin and not a density.
Is there a way around this?
I have added an update for this.
You should be now able specify bins=(start,end,size)
pd.DataFrame({
'firstHistogram': np.random.randn(10000),
'secondHistogram': 0.2 * np.random.randn(10000)}).iplot(kind='hist',bins=(-4,4,.08))
Should now return: Custom bins
As far as I know there is no direct way of doing it in cufflinks. The output shown in your code is wrong in my opinion, i.e. I think that's a bug in cufflinks.
But you can easily imitate the cufflinks function with a few lines of code. You can get the same layout with cufflinks.getLayout()
and just need to set barmode
to overlay
.
import pandas as pd
import plotly
import cufflinks
plotly.offline.init_notebook_mode()
pd.DataFrame({
'firstHistogram': np.random.randn(10000),
'secondHistogram': 0.2 * np.random.randn(10000)
})
data = list()
for dd in df:
histo = plotly.graph_objs.Histogram(x=df[dd],
name=dd,
xbins={'start': -4, 'end': 4, 'size': 0.08},
autobinx=False,
opacity=0.8
)
data.append(histo)
layout = plotly.graph_objs.Layout(cufflinks.getLayout(),
barmode='overlay')
fig = plotly.graph_objs.Figure(data=data,
layout=layout)
plotly.offline.iplot(fig)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With