Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python boxplot out of columns of different lengths

I have the following dataframe in Python (the actual dataframe is much bigger, just presenting a small sample):

      A     B     C     D     E     F
0  0.43  0.52  0.96  1.17  1.17  2.85
1  0.43  0.52  1.17  2.72  2.75  2.94
2  0.43  0.53  1.48  2.85  2.83  
3  0.47  0.59  1.58        3.14  
4  0.49  0.80        

I convert the dataframe to numpy using df.values and then pass that to boxplot.

When I try to make a boxplot out of this pandas dataframe, the number of values picked from each column is restricted to the least number of values in a column (in this case, column F). Is there any way I can boxplot all values from each column?

NOTE: I use df.dropna to drop the rows in each column with missing values. However, this is resizing the dataframe to the lowest common denominator of column length, and messing up the plotting.

import prettyplotlib as ppl
import numpy as np
import pandas
import matplotlib as mpl
from matplotlib import pyplot

df = pandas.DataFrame.from_csv(csv_data,index_col=False)
df = df.dropna()
labels = ['A', 'B', 'C', 'D', 'E', 'F']
fig, ax = pyplot.subplots()
ppl.boxplot(ax, df.values, xticklabels=labels)
pyplot.show()
like image 985
user308827 Avatar asked Apr 17 '14 21:04

user308827


People also ask

How do you adjust a Boxplot in Python?

Steps. Set the figure size and adjust the padding between and around the subplots. Make a Pandas dataframe, i.e., two-dimensional, size-mutable, potentially heterogeneous tabular data. Make a box and whisker plot, using boxplot() method with width tuple to adjust the box in boxplot.

How do I find the length of a column in pandas?

Get the number of columns: len(df. columns) The number of columns of pandas. DataFrame can be obtained by applying len() to the columns attribute.

What is box plot in pandas?

A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data.


1 Answers

The right way to do it, saving from reinventing the wheel, would be to use the .boxplot() in pandas, where the nan handled correctly:

In [31]:

print df
      A     B     C     D     E     F
0  0.43  0.52  0.96  1.17  1.17  2.85
1  0.43  0.52  1.17  2.72  2.75  2.94
2  0.43  0.53  1.48  2.85  2.83   NaN
3  0.47  0.59  1.58   NaN  3.14   NaN
4  0.49  0.80   NaN   NaN   NaN   NaN

[5 rows x 6 columns]
In [32]:

_=plt.boxplot(df.values)
_=plt.xticks(range(1,7),labels)
plt.savefig('1.png') #keeping the nan's and plot by plt

enter image description here

In [33]:

_=df.boxplot()
plt.savefig('2.png') #keeping the nan's and plot by pandas

enter image description here

In [34]:

_=plt.boxplot(df.dropna().values)
_=plt.xticks(range(1,7),labels)
plt.savefig('3.png') #dropping the nan's and plot by plt

enter image description here

like image 99
CT Zhu Avatar answered Sep 20 '22 20:09

CT Zhu