Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating multiple scatter_matrix plots in the same chart with pandas

Tags:

python

pandas

I have two dataframes with identical column names. I would like to produce pairplot scatter plots to understand how the variables interact. I would like to plot the first dataframe with a different color than the second matrix. Is this possible? It seems like the scatter_matrix function overwrites the previous plot by default.

Why is my first-generated plot overwritten? How can I visualize both data frames at once using the scatter_matrix function?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dat = pd.DataFrame({'x%i' % ii: np.random.uniform(-1, 1, 100) for ii in range(3)})
dat2 = pd.DataFrame({'x%i' % ii: np.random.uniform(0, 1, 100) for ii in range(3)})
ax = pd.plotting.scatter_matrix(dat, c='orange')
pd.plotting.scatter_matrix(dat2, c='k')
# pd.plotting.scatter_matrix(dat2, c='k', ax=ax) # results in error
plt.savefig('example')

enter image description here

(The solution I desire should have two seperate point colors, with one set ranging from 0 to 1 and the other ranging from -1 to 1.)

like image 354
kilojoules Avatar asked Dec 07 '22 10:12

kilojoules


1 Answers

If you are willing to use another library called seaborn and if I understood correctly, it can be done with sns.pairplot easily. You just need to concat both dataframe and create a column to use as hue with the name you want in the legend.

import seaborn as sns
sns.pairplot(pd.concat([dat.assign(hue='dat'), 
                        dat2.assign(hue='dat2')]), 
             hue='hue', 
             diag_kind='hist', 
             palette=['orange', 'k'])

enter image description here

Note: I find the diagonal not looking good with histogram in this case, I would rather use 'kde' instead of 'hist' for the parameter diag_kind, but it depends on what you want.

like image 56
Ben.T Avatar answered Apr 27 '23 00:04

Ben.T