I have two dataframes with identical column names. I would like to produce pairplot scatter plots to understand how the variables interact. I would like to plot the first dataframe with a different color than the second matrix. Is this possible? It seems like the scatter_matrix
function overwrites the previous plot by default.
Why is my first-generated plot overwritten? How can I visualize both data frames at once using the scatter_matrix
function?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dat = pd.DataFrame({'x%i' % ii: np.random.uniform(-1, 1, 100) for ii in range(3)})
dat2 = pd.DataFrame({'x%i' % ii: np.random.uniform(0, 1, 100) for ii in range(3)})
ax = pd.plotting.scatter_matrix(dat, c='orange')
pd.plotting.scatter_matrix(dat2, c='k')
# pd.plotting.scatter_matrix(dat2, c='k', ax=ax) # results in error
plt.savefig('example')
(The solution I desire should have two seperate point colors, with one set ranging from 0 to 1 and the other ranging from -1 to 1.)
If you are willing to use another library called seaborn
and if I understood correctly, it can be done with sns.pairplot
easily. You just need to concat
both dataframe and create a column to use as hue
with the name you want in the legend.
import seaborn as sns
sns.pairplot(pd.concat([dat.assign(hue='dat'),
dat2.assign(hue='dat2')]),
hue='hue',
diag_kind='hist',
palette=['orange', 'k'])
Note: I find the diagonal not looking good with histogram in this case, I would rather use 'kde'
instead of 'hist'
for the parameter diag_kind
, but it depends on what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With