I have the following dataframe in Python (the actual dataframe is much bigger, just presenting a small sample):
A B C D E F
0 0.43 0.52 0.96 1.17 1.17 2.85
1 0.43 0.52 1.17 2.72 2.75 2.94
2 0.43 0.53 1.48 2.85 2.83
3 0.47 0.59 1.58 3.14
4 0.49 0.80
I convert the dataframe to numpy using df.values and then pass that to boxplot.
When I try to make a boxplot out of this pandas dataframe, the number of values picked from each column is restricted to the least number of values in a column (in this case, column F). Is there any way I can boxplot all values from each column?
NOTE: I use df.dropna to drop the rows in each column with missing values. However, this is resizing the dataframe to the lowest common denominator of column length, and messing up the plotting.
import prettyplotlib as ppl
import numpy as np
import pandas
import matplotlib as mpl
from matplotlib import pyplot
df = pandas.DataFrame.from_csv(csv_data,index_col=False)
df = df.dropna()
labels = ['A', 'B', 'C', 'D', 'E', 'F']
fig, ax = pyplot.subplots()
ppl.boxplot(ax, df.values, xticklabels=labels)
pyplot.show()
Steps. Set the figure size and adjust the padding between and around the subplots. Make a Pandas dataframe, i.e., two-dimensional, size-mutable, potentially heterogeneous tabular data. Make a box and whisker plot, using boxplot() method with width tuple to adjust the box in boxplot.
Get the number of columns: len(df. columns) The number of columns of pandas. DataFrame can be obtained by applying len() to the columns attribute.
A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data.
The right way to do it, saving from reinventing the wheel, would be to use the .boxplot()
in pandas
, where the nan
handled correctly:
In [31]:
print df
A B C D E F
0 0.43 0.52 0.96 1.17 1.17 2.85
1 0.43 0.52 1.17 2.72 2.75 2.94
2 0.43 0.53 1.48 2.85 2.83 NaN
3 0.47 0.59 1.58 NaN 3.14 NaN
4 0.49 0.80 NaN NaN NaN NaN
[5 rows x 6 columns]
In [32]:
_=plt.boxplot(df.values)
_=plt.xticks(range(1,7),labels)
plt.savefig('1.png') #keeping the nan's and plot by plt
In [33]:
_=df.boxplot()
plt.savefig('2.png') #keeping the nan's and plot by pandas
In [34]:
_=plt.boxplot(df.dropna().values)
_=plt.xticks(range(1,7),labels)
plt.savefig('3.png') #dropping the nan's and plot by plt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With