Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Plot a Matrix of Seaborn Distplots for All Columns in the Dataframe

Imagine I have a dataframe with 9 columns. I want to be able to achieve the same effect as df.hist(), but with sns.distplot().

In other words, I want to be able to plot the sns.distplot() for each column in the dataframe in a visualization of 3 rows and 3 columns where each sub figure represents the unique sns.distplot() of each column for the total number of columns in the dataframe.

I experimented a bit with using a for loop over axes and columns for the dataframe, but I'm only able to achieve results for specifying columns. I'm not sure how to represent the code to work for rows and columns.

I also looked into sns.FacetGrid, but I'm not sure how to go about solving this problem using FacetGrid.

I find the df.hist() function to exactly what I want, but I want to be able to do it with the sns.distplot for all the columns in that same representation as the output of df.hist().

If it helps to put the context of the dataframe, I'm essentially reading Google Colab's training and testing sets for the California Housing Dataset which contains all the columns except for the ocean_proximity. If you want to help me figure out this problem using that dataset, please get it from Kaggle and drop the ocean_proximity column.

My approach for 9 columns:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('housing.csv')
df.drop('ocean_proximity', axis=1, inplace=True)
fig, axes = plt.subplots(ncols=len(df.columns), figsize=(30,15))
for ax, col in zip(axes, df.columns):
  sns.distplot(df[col], ax=ax)
  plt.tight_layout() 
plt.show()
like image 660
pythonRCNewbie Avatar asked Mar 28 '19 02:03

pythonRCNewbie


2 Answers

Slightly more elegant imo than the solution by @Bruce Swain:

import matplotlib.pyplot as plt
import seaborn as sns

for i, column in enumerate(df.columns, 1):
    plt.subplot(3,3,i)
    sns.histplot(df[column])
like image 67
Josiah Coad Avatar answered Sep 21 '22 13:09

Josiah Coad


You can create multiple figures with matplotlib using subplots like this

import matplotlib.pyplot as plt
# Define the number of rows and columns you want
n_rows=3
n_cols=3
# Create the subplots
fig, axes = plt.subplots(nrows=n_rows, ncols=n_cols)

You can view the subplots function as creating a matrix (2D array) of shape [n_rows, n_cols], and using the coordinates of elements of the matrix to choose where to plot.

You then plot each column in a different subplot with the ax argument to give the coordinates of an element of matrix. Using ax=axes[i,j] will specify the subplot you want to print in:

for i, column in enumerate(df.columns):
    sns.distplot(df[column],ax=axes[i//n_cols,i%n_cols])

From BenCaldwell comment "i//ncols gives the floor division which is the row when you are working left to right then top to bottom. i%ncols will give you the integer remainder which is the column when you are working left to right top to bottom."

If you want to plot a discrete dataset instead of using distplot to estimate the data distribution behind your data, you can use the new histplot function.

like image 37
Bruce Swain Avatar answered Sep 18 '22 13:09

Bruce Swain