How to overlay a Seaborn jointplot with a "marginal" (distribution histogram) from a different dataset

Tags:

I have plotted a Seaborn JointPlot from a set of "observed counts vs concentration" which are stored in a pandas DataFrame. I would like to overlay (on the same set of axes) a marginal (ie: univariate distribution) of the "expected counts" for each concentration on top of the existing marginal, so that the difference can be easily compared.

This graph is very similar to what I want, although it will have different axes and only two datasets:

Here is an example of how my data is laid out and related:

df_observed

Click to copy

x axis--> log2(concentration): 1,1,1,2,3,3,3 (zero-counts have been omitted)

y axis--> log2(count): 4.5, 5.7, 5.0, 9.3, 16.0, 16.5, 15.4 (zero-counts have been omitted)

df_expected

Click to copy

x axis--> log2(concentration): 1,1,1,2,2,2,3,3,3

an overlaying of the distribution of df_expected on top of that of df_observed would therefore indicate where there were counts missing at each concentration.

What I currently have

Jointplot with the observed counts at each concentration Separate jointplot of the expected counts at each concentration. I want the marginal from this plot to be overlaid on top of the marginal from the above jointplot

PS: I am new to Stack Overflow so any suggestions about how to better ask questions will be met with gratitude. Also, I have searched extensively for an answer to my question but to no avail. In addition, a Plotly solution would be equally helpful. Thank you

660

asked Mar 10 '16 15:03

Nonchalant

1 Answers

Wrote a function to plot it, very loosly based on @blue_chip's idea. You might still need to tweak it a bit for your specific needs.

Here is an example usage:

enter image description here

Example data:

Click to copy

import seaborn as sns, numpy as np, matplotlib.pyplot as plt, pandas as pd
n=1000
m1=-3
m2=3

df1 = pd.DataFrame((np.random.randn(n)+m1).reshape(-1,2), columns=['x','y'])
df2 = pd.DataFrame((np.random.randn(n)+m2).reshape(-1,2), columns=['x','y'])
df3 = pd.DataFrame(df1.values+df2.values, columns=['x','y'])
df1['kind'] = 'dist1'
df2['kind'] = 'dist2'
df3['kind'] = 'dist1+dist2'
df=pd.concat([df1,df2,df3])

Function definition:

Click to copy

def multivariateGrid(col_x, col_y, col_k, df, k_is_color=False, scatter_alpha=.5):
    def colored_scatter(x, y, c=None):
        def scatter(*args, **kwargs):
            args = (x, y)
            if c is not None:
                kwargs['c'] = c
            kwargs['alpha'] = scatter_alpha
            plt.scatter(*args, **kwargs)

        return scatter

    g = sns.JointGrid(
        x=col_x,
        y=col_y,
        data=df
    )
    color = None
    legends=[]
    for name, df_group in df.groupby(col_k):
        legends.append(name)
        if k_is_color:
            color=name
        g.plot_joint(
            colored_scatter(df_group[col_x],df_group[col_y],color),
        )
        sns.distplot(
            df_group[col_x].values,
            ax=g.ax_marg_x,
            color=color,
        )
        sns.distplot(
            df_group[col_y].values,
            ax=g.ax_marg_y,
            color=color,            
            vertical=True
        )
    # Do also global Hist:
    sns.distplot(
        df[col_x].values,
        ax=g.ax_marg_x,
        color='grey'
    )
    sns.distplot(
        df[col_y].values.ravel(),
        ax=g.ax_marg_y,
        color='grey',
        vertical=True
    )
    plt.legend(legends)

Usage:

Click to copy

multivariateGrid('x', 'y', 'kind', df=df)

answered Sep 27 '22 21:09

ntg

Related questions
                            
                                Fully disable python logging
                            
                                Numpy: find index of elements in one array that occur in another array
                            
                                Create variable name from two string in python
                            
                                Include with url variable in Django template
                            
                                list() takes at most 1 argument (3 given)
                            
                                How do I properly set DPI when saving a pillow image?
                            
                                Creating a matrix of arbitrary size where rows sum to 1?
                            
                                How to insert Billion of data to Redis efficiently?
                            
                                Python, Flask, Gunicorn Error: Unrecognized Arguments
                            
                                Memory Efficient L2 norm using Python broadcasting
                            
                                Python: How can I find an image on screen by using: pyautogui lib?
                            
                                sklearn log_loss different number of classes
                            
                                How to specify parameters on a Request using scrapy
                            
                                Conditional removal of labels in Matplotlib pie chart
                            
                                Python: How to check if cell in CSV file is empty?
                            
                                Django rest framework represent flatten nested object
                            
                                Django ALLOWED_HOST setting for Elastic beanstalk instance behind Elastic Load Balancer
                            
                                How can I leverage builtin pagination for a list_route in the Django Rest Framework? [closed]
                            
                                Python, zip multiple lists where one list requires two items each
                            
                                anaconda+sublimetext, reports type hinting as errors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to overlay a Seaborn jointplot with a "marginal" (distribution histogram) from a different dataset

Tags:

python

pandas

overlay

seaborn

Nonchalant

People also ask

1 Answers

ntg

Recent Activity

Donate For Us