I have two dataframes with identical column names. I would like to produce pairplot scatter plots to understand how the variables interact. I would like to plot the first dataframe with a different color than the second matrix. Is this possible? It seems like the scatter_matrix function overwrites the previous plot by default.
Why is my first-generated plot overwritten? How can I visualize both data frames at once using the scatter_matrix function?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dat = pd.DataFrame({'x%i' % ii: np.random.uniform(-1, 1, 100) for ii in range(3)})
dat2 = pd.DataFrame({'x%i' % ii: np.random.uniform(0, 1, 100) for ii in range(3)})
ax = pd.plotting.scatter_matrix(dat, c='orange')
pd.plotting.scatter_matrix(dat2, c='k')
# pd.plotting.scatter_matrix(dat2, c='k', ax=ax) # results in error
plt.savefig('example')

(The solution I desire should have two seperate point colors, with one set ranging from 0 to 1 and the other ranging from -1 to 1.)
If you are willing to use another library called seaborn and if I understood correctly, it can be done with sns.pairplot easily. You just need to concat both dataframe and create a column to use as hue with the name you want in the legend.
import seaborn as sns
sns.pairplot(pd.concat([dat.assign(hue='dat'),
dat2.assign(hue='dat2')]),
hue='hue',
diag_kind='hist',
palette=['orange', 'k'])

Note: I find the diagonal not looking good with histogram in this case, I would rather use 'kde' instead of 'hist' for the parameter diag_kind, but it depends on what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With