Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to do with missing values when plotting with seaborn?

I replaced the missing values with NaN using lambda following function:

data = data.applymap(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x)

,where data is the dataframe I am working on.

Using seaborn afterwards,I tried to plot one of its attributes,alcconsumption using seaborn.distplot as follows:

seaborn.distplot(data['alcconsumption'],hist=True,bins=100)
plt.xlabel('AlcoholConsumption')
plt.ylabel('Frequency(normalized 0->1)')

It's giving me the following error:

AttributeError: max must be larger than min in range parameter.
like image 386
datavinci Avatar asked Oct 02 '15 07:10

datavinci


People also ask

How do I find missing values in Seaborn?

The key function for both the approaches to visualize missing data is to use Pandas isna() function to find if each element in the dataframe is a missing value or not. By using isna() on Pandas dataframe, we get a boolean dataframe with True for missing data and False for the NOT missing data.

Which plot will be used to check missing values?

5 — Matrix Plots From the matrix plot, you can see where the missing values are located.

Which function can be used to Visualise the null values in a dataset?

See that there are null values in the column Age . The second way of finding whether we have null values in the data is by using the isnull() function. See that all the null values in the dataset are in the column – Age .

How does Python handle missing values?

Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


2 Answers

This is a known issue with matplotlib/pylab histograms!

See e.g. https://github.com/matplotlib/matplotlib/issues/6483

where various workarounds are suggested, two favourites (for example from https://stackoverflow.com/a/19090183/1021819) being:

import numpy as np
nbins=100
A=data['alcconsumption']
Anan=A[~np.isnan(A)] # Remove the NaNs

seaborn.distplot(Anan,hist=True,bins=nbins)

Alternatively, specify bin edges (in this case by anyway making use of Anan...):

Amin=min(Anan)
Amax=max(Anan)
seaborn.distplot(A,hist=True,bins=np.linspace(Amin,Amax,nbins))
like image 189
jtlz2 Avatar answered Oct 06 '22 04:10

jtlz2


You can use the following line to select the non-NaN values for a distribution plot using seaborn:

seaborn.distplot(data['alcconsumption'].notnull(),hist=True,bins=100)
like image 30
ZicoNuna Avatar answered Oct 06 '22 02:10

ZicoNuna