Pandas plot hist sharex=False does not behave as expected

Tags:

I am trying to plot histograms of a couple of series from a dataframe. Series have different maximum values:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].max()

returns:

age_sent       1516.564016
last_seen       986.790035
forum_reply     137.000000
forum_cnt       155.000000
forum_exp        13.000000
forum_quest      10.000000

When I tried to plot histograms I use sharex=False, subplots=True but it looks like sharex property is ignored:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].plot.hist(figsize=(20, 10), logy=True, sharex=False, subplots=True)

enter image description here

I can clearly plot each of them separately, but this is less desirable. Also I would like to know what I am doing wrong.

The data I have is too big too be included, but it is easy to create something similar:

ttt = pd.DataFrame({'a': pd.Series(np.random.uniform(1, 1000, 100)), 'b': pd.Series(np.random.uniform(1, 10, 100))})

Now I have:

ttt.plot.hist(logy=True, sharex=False, subplots=True)

Check the x axis. I want it to be this way (but using one command with subplots).

ttt['a'].plot.hist(logy=True)
ttt['b'].plot.hist(logy=True)

864

asked Sep 01 '16 04:09

Salvador Dali

2 Answers

The sharex (most likely) just falls through to mpl and sets if the panning / zooming one axes changes the other.

The issue you are having is that the same bins are being used for all of the histograms (which is enforced by https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L2053 if I am understanding the code correctly) because pandas assumes that if you multiple histograms then you are probably plotting columns of similar data so using the same binning makes them comparable.

Assuming you have mpl >= 1.5 and numpy >= 1.11 you should write your self a little helper function like

import matplotlib.pyplot as plt
import matplotlib as mpl 
import pandas as pd
import numpy as np

plt.ion()


def make_hists(df, fig_kwargs=None, hist_kwargs=None,
               style_cycle=None):
    '''

    Parameters
    ----------
    df : pd.DataFrame
        Datasource

    fig_kwargs : dict, optional
        kwargs to pass to `plt.subplots`

        defaults to {'fig_size': (4, 1.5*len(df.columns),
                     'tight_layout': True}

    hist_kwargs : dict, optional
        Extra kwargs to pass to `ax.hist`, defaults
        to `{'bins': 'auto'}

    style_cycle : cycler
        Style cycle to use, defaults to 
        mpl.rcParams['axes.prop_cycle']

    Returns
    -------
    fig : mpl.figure.Figure
        The figure created

    ax_list : list
        The mpl.axes.Axes objects created 

    arts : dict 
        maps column names to the histogram artist
    '''
    if style_cycle is None:
        style_cycle = mpl.rcParams['axes.prop_cycle']

    if fig_kwargs is None:
        fig_kwargs = {}
    if hist_kwargs is None:
        hist_kwargs = {}

    hist_kwargs.setdefault('log', True)
    # this requires nmupy >= 1.11
    hist_kwargs.setdefault('bins', 'auto')
    cols = df.columns

    fig_kwargs.setdefault('figsize', (4, 1.5*len(cols)))
    fig_kwargs.setdefault('tight_layout', True)
    fig, ax_lst = plt.subplots(len(cols), 1, **fig_kwargs)
    arts = {}
    for ax, col, sty in zip(ax_lst, cols, style_cycle()):
        h = ax.hist(col, data=df, **hist_kwargs, **sty)
        ax.legend()

        arts[col] = h

    return fig, list(ax_lst), arts

dist = [1, 2, 5, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

make_hists(test_df)

enter image description here

104

answered Oct 09 '22 11:10

tacaswell

The current answer works, but there is an easier workaround in recent versions.

While df.plot.hist does not respect sharex=False, df.plot.density does.

dist = [1, 2, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

test_df.plot.density(subplots=True, sharex=False, sharey=False, layout=(-1, 2))

density plots respect sharex

answered Oct 09 '22 12:10

hume

Related questions
                            
                                How to use monkeypatch in a "setup" method for unit tests using pytest?
                            
                                Parse BeautifulSoup element into Selenium
                            
                                Reading large file in Spark issue - python
                            
                                catch exception and return empty dataframe
                            
                                Dividing Pandas Dataframe by Week
                            
                                How to drop rows in an H2OFrame?
                            
                                Handle invalid arguments with argparse in Python
                            
                                multiprocessing module and distinct psycopg2 connections
                            
                                Angular-cli with any other server
                            
                                Tensorflow: why is zip() function used in the steps involving applying the gradients?
                            
                                Finding new position (x,y) after resizing image
                            
                                Customize Keras' loss function in a way that the y_true will depend on y_pred
                            
                                Howto copy a dask dataframe?
                            
                                What exactly happens on the computer when multiple requests came to the webserver serving django or pyramid application?
                            
                                What specifically should the domain be for NTLM authentication when using python-requests library?
                            
                                How to create image from numpy float32 array?
                            
                                How to do a "tree walk" recursively on an Abstract Syntax Tree?
                            
                                Absolute Import Not Working, But Relative Import Does
                            
                                Call a C++ function from Python and convert a OpenCV Mat to a Numpy array
                            
                                Issues with Python pandas: read_html and python3-lxml installation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas plot hist sharex=False does not behave as expected

Tags:

python

pandas

matplotlib

Salvador Dali

People also ask

2 Answers

tacaswell

hume

Recent Activity

Donate For Us