Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting histogram using seaborn for a dataframe

I have a dataFrame which has multiple columns and many rows..Many row has no value for column so in the data frame its represented as NaN. The example dataFrame is as follows,

df.head()
GEN Sample_1    Sample_2    Sample_3    Sample_4    Sample_5    Sample_6    Sample_7    Sample_8    Sample_9    Sample_10   Sample_11   Sample_12   Sample_13   Sample_14
A123    9.4697  3.19689 4.8946  8.54594 13.2568 4.93848 3.16809 NAN NAN NAN NAN NAN NAN NAN
A124    6.02592 4.0663  3.9218  2.66058 4.38232         NAN NAN NAN NAN NAN NAN NAN
A125    7.88999 2.51576 4.97483 5.8901  21.1346 5.06414 15.3094 2.68169 8.12449 NAN NAN NAN NAN NAN
A126    5.99825 10.2186 15.2986 7.53729 4.34196 8.75048 16.9358 5.52708 NAN NAN NAN NAN NAN NAN
A127    28.5014 4.86702 NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN

I wanted to plot histogram for this dataFrame using seaborn function from python and so i was trying the following lines,

sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)

But its throwing the following error,

    ValueError                                Traceback (most recent call last)
    <ipython-input-80-896d7fe85ef3> in <module>()
          1 sns.set(color_codes=True)
          2 sns.set(style="white", palette="muted")
    ----> 3 sns.distplot(df)

    /anaconda3/lib/python3.4/site-packages/seaborn/distributions.py in distplot(a, bins, hist, kde, rug, fit, hist_kws, kde_kws, rug_kws, fit_kws, color, vertical, norm_hist, axlabel, label, ax)
        210         hist_color = hist_kws.pop("color", color)
        211         ax.hist(a, bins, orientation=orientation,
    --> 212                 color=hist_color, **hist_kws)
        213         if hist_color != color:
        214             hist_kws["color"] = hist_color

   /anaconda3/lib/python3.4/site-packages/matplotlib/axes/_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
       5627             color = mcolors.colorConverter.to_rgba_array(color)
       5628             if len(color) != nx:
    -> 5629                 raise ValueError("color kwarg must have one color per dataset")
       5630 
       5631         # We need to do to 'weights' what was done to 'x'

    ValueError: color kwarg must have one color per dataset

Any helps/suggestions to get rid of this error would be greatly appreciated..!!!

like image 488
user1017373 Avatar asked Oct 03 '15 13:10

user1017373


People also ask

How do you plot a histogram for a Dataframe in Python?

To plot a Histogram using Matplotlib, you need to first import the Histogram class from the Matplotlib library. The Histogram class has a plot() method which is used to plot histograms. The plot() method accepts a dataframe column as an argument. The Histogram will be plotted on the column of the dataframe.

Does seaborn have histogram?

Seaborn is built on top of Python's core visualization library Matplotlib, meaning it makes use of Matplotlib functionalities. We have already learned Matplotlib Histogram.

Does seaborn work with pandas?

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures. Seaborn helps you explore and understand your data.

Is Countplot same as histogram?

Show the counts of observations in each categorical bin using bars. A count plot can be thought of as a histogram across a categorical, instead of quantitative, variable.


1 Answers

I had also thought the seaborn documentation mentioned that multiple columns could be plotted simultaneously, and highlighted by color by default.

But upon re-reading, I did not see anything. Instead, I think I inferred it from this tutorial, where part of the way through, the tutorial plots a data frame with multiple columns.


However, the "solution" is trivial, and hopefully exactly what you're looking for:

sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)

for col_id in df.columns:
    sns.distplot(df[col_id])

By default, this will alter the colors, "knowing" which one has already been used.

Generated image from code above (using different data set)

Note: I used a different data set, since I wasn't sure how to re-create yours.

like image 153
Mike Williamson Avatar answered Oct 15 '22 13:10

Mike Williamson