I replaced the missing values with NaN using lambda following function:
data = data.applymap(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x)
,where data is the dataframe I am working on.
Using seaborn afterwards,I tried to plot one of its attributes,alcconsumption using seaborn.distplot as follows:
seaborn.distplot(data['alcconsumption'],hist=True,bins=100)
plt.xlabel('AlcoholConsumption')
plt.ylabel('Frequency(normalized 0->1)')
It's giving me the following error:
AttributeError: max must be larger than min in range parameter.
The key function for both the approaches to visualize missing data is to use Pandas isna() function to find if each element in the dataframe is a missing value or not. By using isna() on Pandas dataframe, we get a boolean dataframe with True for missing data and False for the NOT missing data.
5 — Matrix Plots From the matrix plot, you can see where the missing values are located.
See that there are null values in the column Age . The second way of finding whether we have null values in the data is by using the isnull() function. See that all the null values in the dataset are in the column – Age .
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
This is a known issue with matplotlib/pylab histograms!
See e.g. https://github.com/matplotlib/matplotlib/issues/6483
where various workarounds are suggested, two favourites (for example from https://stackoverflow.com/a/19090183/1021819) being:
import numpy as np
nbins=100
A=data['alcconsumption']
Anan=A[~np.isnan(A)] # Remove the NaNs
seaborn.distplot(Anan,hist=True,bins=nbins)
Alternatively, specify bin edges (in this case by anyway making use of Anan
...):
Amin=min(Anan)
Amax=max(Anan)
seaborn.distplot(A,hist=True,bins=np.linspace(Amin,Amax,nbins))
You can use the following line to select the non-NaN values for a distribution plot using seaborn:
seaborn.distplot(data['alcconsumption'].notnull(),hist=True,bins=100)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With