Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seaborn pairplot and NaN values

I'm trying to understand why this fails, even though the documentation says:

dropna : boolean, optional Drop missing values from the data before plotting.

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.__version__
# '0.7.dev'
# generate an example DataFrame
a = pd.DataFrame(data={
    'a': np.random.normal(size=(100,)),
    'b': np.random.lognormal(size=(100,)),
    'c': np.random.exponential(size=(100,))})
sns.pairplot(a) # this works as expected
# snip
b = a.copy()
b.iloc[5,2] = np.nan # replace one value in col 'c' by a NaN
sns.pairplot(b) # this fails with error 
                # "AttributeError: max must be larger than min in range parameter."
                # in histogram(a, bins, range, normed, weights, density)"
> sns.pairplot(b, dropna=True) # same error as above
like image 575
Diziet Asahi Avatar asked Jul 18 '15 16:07

Diziet Asahi


People also ask

What is Pairplot in Seaborn?

A pairplot plot a pairwise relationships in a dataset. The pairplot function creates a grid of Axes such that each variable in data will by shared in the y-axis across a single row and in the x-axis across a single column.

Why do we use Pairplot?

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.

How do you give the title to Pairplot in Seaborn?

To show the title for the diagram for Seaborn pairplot(), we can use pp. fig. suptitle() method.


1 Answers

when you are using the data directly, ie

sns.pairplot(b) #Same as sns.pairplot(b, x_vars=['a','b','c'] , y_vars=['a','b','c'],dropna=True)

your are plotting against all the columns in the DataFrame,Then make sure no:of rows are same in all columns.

sns.pairplot(b, x_vars=['a','c'] , y_vars=['a','b','c'],dropna=True)

In this case it works fine, but there will be a minute difference in the graph for removing the 'NaN value'.

So, If you want to plot with the whole Data then :-

  • either the null values must be replaced using "fillna()",

  • or the whole row containing 'nan values' must be dropped

    b = b.drop(b.index[5])
    sns.pairplot(b)
    

    pairplot for dropped values

like image 122
Suresh2692 Avatar answered Sep 21 '22 01:09

Suresh2692