How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks?

Question

It is relatively easy to manually give a list of bins when plotting an histogram with matplotlib, as shown for example here.

A simple example of this is the following:

import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.hist(np.random.randn(10000), bins=np.arange(-4, 4, 0.1))
ax.hist(0.2 * np.random.randn(10000), bins=np.arange(-4, 4, 0.1))
plt.show()

This can also be equivalently done from a pandas.DataFrame with:

pd.DataFrame({
    'firstHistogram': np.random.randn(10000),
    'secondHistogram': 0.2 * np.random.randn(10000)
}).plot(kind='hist', bins=np.arange(-4, 4, 0.1))

Going further, plotly allows to directly interface to pandas through the cufflinks module, which allows to do things like the following:

pd.DataFrame({
    'firstHistogram': np.random.randn(10000),
    'secondHistogram': 0.2 * np.random.randn(10000)
}).iplot(kind='hist', bins=100)

enter image description here

But here is the catch: the iplot method provided by cufflinks does not seem to accept a list for bins. When a number is provided like in the above example, that number is used to bin independently both datasets, which results in unequal binning, with potentially misleading results (see the equal heights in the above plot).

While this effect can be somewhat mitigated using the histnorm='density' option, one may want to see the counts per bin and not a density.

Is there a way around this?

jorge.santos · Accepted Answer

I have added an update for this. You should be now able specify bins=(start,end,size)

pd.DataFrame({
'firstHistogram': np.random.randn(10000),
'secondHistogram': 0.2 * np.random.randn(10000)}).iplot(kind='hist',bins=(-4,4,.08))

Should now return: Custom bins

Maximilian Peters · Answer

As far as I know there is no direct way of doing it in cufflinks. The output shown in your code is wrong in my opinion, i.e. I think that's a bug in cufflinks.

But you can easily imitate the cufflinks function with a few lines of code. You can get the same layout with cufflinks.getLayout() and just need to set barmode to overlay.

enter image description here

import pandas as pd
import plotly
import cufflinks

plotly.offline.init_notebook_mode()

pd.DataFrame({
    'firstHistogram': np.random.randn(10000),
    'secondHistogram': 0.2 * np.random.randn(10000)
})

data = list()

for dd in df:
    histo = plotly.graph_objs.Histogram(x=df[dd], 
                                        name=dd,
                                        xbins={'start': -4, 'end': 4, 'size': 0.08},
                                        autobinx=False, 
                                        opacity=0.8
                                       )
    data.append(histo)
layout = plotly.graph_objs.Layout(cufflinks.getLayout(), 
                                  barmode='overlay')
fig = plotly.graph_objs.Figure(data=data, 
                               layout=layout)
plotly.offline.iplot(fig)

How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks?

Tags:

pandas

histogram

plotly

glS

2 Answers

jorge.santos

Maximilian Peters

Recent Activity

Donate For Us

How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks?

Tags:

pandas

histogram

plotly

glS

2 Answers

jorge.santos

Maximilian Peters

Related questions

Recent Activity

Donate For Us