I have a dataFrame which has multiple columns and many rows..Many row has no value for column so in the data frame its represented as NaN. The example dataFrame is as follows,
df.head()
GEN Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6 Sample_7 Sample_8 Sample_9 Sample_10 Sample_11 Sample_12 Sample_13 Sample_14
A123 9.4697 3.19689 4.8946 8.54594 13.2568 4.93848 3.16809 NAN NAN NAN NAN NAN NAN NAN
A124 6.02592 4.0663 3.9218 2.66058 4.38232 NAN NAN NAN NAN NAN NAN NAN
A125 7.88999 2.51576 4.97483 5.8901 21.1346 5.06414 15.3094 2.68169 8.12449 NAN NAN NAN NAN NAN
A126 5.99825 10.2186 15.2986 7.53729 4.34196 8.75048 16.9358 5.52708 NAN NAN NAN NAN NAN NAN
A127 28.5014 4.86702 NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN
I wanted to plot histogram for this dataFrame using seaborn function from python and so i was trying the following lines,
sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)
But its throwing the following error,
ValueError Traceback (most recent call last)
<ipython-input-80-896d7fe85ef3> in <module>()
1 sns.set(color_codes=True)
2 sns.set(style="white", palette="muted")
----> 3 sns.distplot(df)
/anaconda3/lib/python3.4/site-packages/seaborn/distributions.py in distplot(a, bins, hist, kde, rug, fit, hist_kws, kde_kws, rug_kws, fit_kws, color, vertical, norm_hist, axlabel, label, ax)
210 hist_color = hist_kws.pop("color", color)
211 ax.hist(a, bins, orientation=orientation,
--> 212 color=hist_color, **hist_kws)
213 if hist_color != color:
214 hist_kws["color"] = hist_color
/anaconda3/lib/python3.4/site-packages/matplotlib/axes/_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5627 color = mcolors.colorConverter.to_rgba_array(color)
5628 if len(color) != nx:
-> 5629 raise ValueError("color kwarg must have one color per dataset")
5630
5631 # We need to do to 'weights' what was done to 'x'
ValueError: color kwarg must have one color per dataset
Any helps/suggestions to get rid of this error would be greatly appreciated..!!!
To plot a Histogram using Matplotlib, you need to first import the Histogram class from the Matplotlib library. The Histogram class has a plot() method which is used to plot histograms. The plot() method accepts a dataframe column as an argument. The Histogram will be plotted on the column of the dataframe.
Seaborn is built on top of Python's core visualization library Matplotlib, meaning it makes use of Matplotlib functionalities. We have already learned Matplotlib Histogram.
Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures. Seaborn helps you explore and understand your data.
Show the counts of observations in each categorical bin using bars. A count plot can be thought of as a histogram across a categorical, instead of quantitative, variable.
I had also thought the seaborn
documentation mentioned that multiple columns could be plotted simultaneously, and highlighted by color by default.
But upon re-reading, I did not see anything. Instead, I think I inferred it from this tutorial, where part of the way through, the tutorial plots a data frame with multiple columns.
However, the "solution" is trivial, and hopefully exactly what you're looking for:
sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)
for col_id in df.columns:
sns.distplot(df[col_id])
By default, this will alter the colors, "knowing" which one has already been used.
Note: I used a different data set, since I wasn't sure how to re-create yours.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With