Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get boxplot data for matplotlib boxplots

I need to get the statistical data which were generated to draw a box plot in Pandas(using dataframe to create boxplots). i.e. Quartile1,Quartile2,Quartile3, lower whisker value, upper whisker value and outliers. I tried the following query to draw the boxplot.

import pandas as pd
df = pd.DataFrame(np.random.rand(100, 5), columns=['A', 'B', 'C', 'D', 'E'])
pd.DataFrame.boxplot(df,return_type = 'both')

Is there a way to do it instead of manually calculating the values?

like image 517
Manura Omal Avatar asked Nov 04 '15 09:11

Manura Omal


People also ask

How do I read a boxplot in matplotlib?

How to interpret the box plot? The bottom of the (green) box is the 25% percentile and the top is the 75% percentile value of the data. So, essentially the box represents the middle 50% of all the datapoints which represents the core region when the data is situated.

How do you draw a boxplot using matplotlib in Python?

Creating Box Plotpyplot module of matplotlib library provides boxplot() function with the help of which we can create box plots. The data values given to the ax. boxplot() method can be a Numpy array or Python list or Tuple of arrays. Let us create the box plot by using numpy.


2 Answers

One option is to use the y data from the plots - probably most useful for the outliers (fliers)

_, bp = pd.DataFrame.boxplot(df, return_type='both')

outliers = [flier.get_ydata() for flier in bp["fliers"]]
boxes = [box.get_ydata() for box in bp["boxes"]]
medians = [median.get_ydata() for median in bp["medians"]]
whiskers = [whiskers.get_ydata() for whiskers in bp["whiskers"]]

But it's probably more straightforward to get the other values (including IQR) using either

quantiles = df.quantile([0.01, 0.25, 0.5, 0.75, 0.99])

or, as suggested by WoodChopper

stats = df.describe()
like image 134
philngo Avatar answered Oct 20 '22 04:10

philngo


  • To get the boxplot data, use matplotlib.cbook.boxplot_stats, which returns a list of dictionaries of statistics used to draw a series of box and whisker plots using matplotlib.axes.Axes.bxp
    • To get the boxplot statistics, pass an array to boxplot_stats.
      • This is not specific to pandas.
  • The default plot engine for pandas, is matplotlib, so using boxplot_stats will return the correct metrics for pandas.DataFrame.plot.box.
  • Pass the numeric columns of interest, to boxplot_stats, as and array, using df.values
  • There can be no NaN values in the columns.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import numpy as np

# test dataframe
np.random.seed(346)
df = pd.DataFrame(np.random.rand(100, 5), columns=['A', 'B', 'C', 'D', 'E'])

# plot the dataframe as needed
ax = df.plot.box(figsize=(8, 6), showmeans=True, grid=True)

enter image description here

  • Extract the boxplot metrics by passing an array to boxplot_stats
    • df.values is a numpy.ndarray.
  • The dicts are in the same order as the column arrays from df.
  • This data had no outliers, fliers, because it was generated with numpy.random.
# get stats
stats = boxplot_stats(df.values)

print(stats)
[out]:
[{'cihi': 0.6008396701195271,
  'cilo': 0.45316512285356997,
  'fliers': array([], dtype=float64),
  'iqr': 0.47030110594253877,
  'mean': 0.49412631128104645,
  'med': 0.5270023964865486,
  'q1': 0.2603486498337239,
  'q3': 0.7306497557762627,
  'whishi': 0.9941975539538199,
  'whislo': 0.00892072823759571},
 {'cihi': 0.5460977498205477,
  'cilo': 0.39283808760835964,
  'fliers': array([], dtype=float64),
  'iqr': 0.4880880962171596,
  'mean': 0.47578540593013985,
  'med': 0.4694679187144537,
  'q1': 0.2466015651284032,
  'q3': 0.7346896613455628,
  'whishi': 0.9906905357196321,
  'whislo': 0.002613905425137064},
 {'cihi': 0.6327876179340386,
  'cilo': 0.47317829117336885,
  'fliers': array([], dtype=float64),
  'iqr': 0.5083099578365278,
  'mean': 0.5202481643792808,
  'med': 0.5529829545537037,
  'q1': 0.24608370844800756,
  'q3': 0.7543936662845353,
  'whishi': 0.9968264819096214,
  'whislo': 0.008450848029956215},
 {'cihi': 0.5429786764060252,
  'cilo': 0.40089287519667627,
  'fliers': array([], dtype=float64),
  'iqr': 0.4525025516221303,
  'mean': 0.4948030963370377,
  'med': 0.4719357758013507,
  'q1': 0.279181107815125,
  'q3': 0.7316836594372553,
  'whishi': 0.9836196084903415,
  'whislo': 0.019864664399723786},
 {'cihi': 0.5413819754851169,
  'cilo': 0.3838462046931251,
  'fliers': array([], dtype=float64),
  'iqr': 0.5017062764076173,
  'mean': 0.4922357500877824,
  'med': 0.462614090089121,
  'q1': 0.2490034171367362,
  'q3': 0.7507096935443536,
  'whishi': 0.9984043081918205,
  'whislo': 0.0036707224412856343}]
like image 41
Trenton McKinney Avatar answered Oct 20 '22 05:10

Trenton McKinney