I need to get the statistical data which were generated to draw a box plot in Pandas(using dataframe to create boxplots). i.e. Quartile1,Quartile2,Quartile3, lower whisker value, upper whisker value and outliers. I tried the following query to draw the boxplot.
import pandas as pd
df = pd.DataFrame(np.random.rand(100, 5), columns=['A', 'B', 'C', 'D', 'E'])
pd.DataFrame.boxplot(df,return_type = 'both')
Is there a way to do it instead of manually calculating the values?
How to interpret the box plot? The bottom of the (green) box is the 25% percentile and the top is the 75% percentile value of the data. So, essentially the box represents the middle 50% of all the datapoints which represents the core region when the data is situated.
Creating Box Plotpyplot module of matplotlib library provides boxplot() function with the help of which we can create box plots. The data values given to the ax. boxplot() method can be a Numpy array or Python list or Tuple of arrays. Let us create the box plot by using numpy.
One option is to use the y data from the plots - probably most useful for the outliers (fliers)
_, bp = pd.DataFrame.boxplot(df, return_type='both')
outliers = [flier.get_ydata() for flier in bp["fliers"]]
boxes = [box.get_ydata() for box in bp["boxes"]]
medians = [median.get_ydata() for median in bp["medians"]]
whiskers = [whiskers.get_ydata() for whiskers in bp["whiskers"]]
But it's probably more straightforward to get the other values (including IQR) using either
quantiles = df.quantile([0.01, 0.25, 0.5, 0.75, 0.99])
or, as suggested by WoodChopper
stats = df.describe()
matplotlib.cbook.boxplot_stats
, which returns a list of dictionaries of statistics used to draw a series of box and whisker plots using matplotlib.axes.Axes.bxp
array
to boxplot_stats
.
pandas
.pandas
, is matplotlib
, so using boxplot_stats
will return the correct metrics for pandas.DataFrame.plot.box
.boxplot_stats
, as and array
, using df.values
NaN
values in the columns.import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import numpy as np
# test dataframe
np.random.seed(346)
df = pd.DataFrame(np.random.rand(100, 5), columns=['A', 'B', 'C', 'D', 'E'])
# plot the dataframe as needed
ax = df.plot.box(figsize=(8, 6), showmeans=True, grid=True)
array
to boxplot_stats
df.values
is a numpy.ndarray
.dicts
are in the same order as the column arrays from df
.fliers
, because it was generated with numpy.random
.# get stats
stats = boxplot_stats(df.values)
print(stats)
[out]:
[{'cihi': 0.6008396701195271,
'cilo': 0.45316512285356997,
'fliers': array([], dtype=float64),
'iqr': 0.47030110594253877,
'mean': 0.49412631128104645,
'med': 0.5270023964865486,
'q1': 0.2603486498337239,
'q3': 0.7306497557762627,
'whishi': 0.9941975539538199,
'whislo': 0.00892072823759571},
{'cihi': 0.5460977498205477,
'cilo': 0.39283808760835964,
'fliers': array([], dtype=float64),
'iqr': 0.4880880962171596,
'mean': 0.47578540593013985,
'med': 0.4694679187144537,
'q1': 0.2466015651284032,
'q3': 0.7346896613455628,
'whishi': 0.9906905357196321,
'whislo': 0.002613905425137064},
{'cihi': 0.6327876179340386,
'cilo': 0.47317829117336885,
'fliers': array([], dtype=float64),
'iqr': 0.5083099578365278,
'mean': 0.5202481643792808,
'med': 0.5529829545537037,
'q1': 0.24608370844800756,
'q3': 0.7543936662845353,
'whishi': 0.9968264819096214,
'whislo': 0.008450848029956215},
{'cihi': 0.5429786764060252,
'cilo': 0.40089287519667627,
'fliers': array([], dtype=float64),
'iqr': 0.4525025516221303,
'mean': 0.4948030963370377,
'med': 0.4719357758013507,
'q1': 0.279181107815125,
'q3': 0.7316836594372553,
'whishi': 0.9836196084903415,
'whislo': 0.019864664399723786},
{'cihi': 0.5413819754851169,
'cilo': 0.3838462046931251,
'fliers': array([], dtype=float64),
'iqr': 0.5017062764076173,
'mean': 0.4922357500877824,
'med': 0.462614090089121,
'q1': 0.2490034171367362,
'q3': 0.7507096935443536,
'whishi': 0.9984043081918205,
'whislo': 0.0036707224412856343}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With