Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interactive boxplot with pandas and Jupyter notebook

I have data in a pandas DataFrame, and I want to create an interactive boxplot that allows me to select the number of days, whilst plotting a boxplot for the values in each of the categories in the column 'category'

This is what my code/data looks like so far:

import numpy as np
import pandas as pd

categories=('A','B','C')

data = {
            'days':      np.random.randint(120, size=100), 
            'category':  np.random.choice(categories, 100),
            'value':     100.0 * np.random.random_sample(100)
       }

df = pd.DataFrame(data)

print(df)

   category  days      value
0         A     4  77.383981
1         A    31  63.011934
2         A     5   1.165061
3         C    59  23.588979
4         A    57  14.906734
5         C   106  33.366634
6         A    29  90.658570
7         B    25  16.137490
8         A   118  34.526302
9         C    76   4.111797
10        A    11  30.195917
..      ...   ...        ...
90        A    64  37.529774
91        A    76   3.771360
92        C   112  93.948775
93        C    14  34.855189
94        B    64  83.106007
95        A    10  78.346319
96        B    86  66.645889
97        A    46  12.969012
98        C    29  57.925427
99        A    59  34.526146

[100 rows x 3 columns]

I want to create a boxplot of the values for each of the categories (for a selected/specified number of days), with the different categories being plotted along the X-axis.

How do I do that using pandas (or matplotlib)?

like image 457
Homunculus Reticulli Avatar asked Jan 15 '17 23:01

Homunculus Reticulli


People also ask

How do you display plots in Jupyter notebook?

Usually, displaying plots involves using the show() function from PyPlot. With Jupyter notebooks, this isn't necessary as the plots are displayed after running the cells containing the code that generates them. These plots are by default, displayed inline, which means, they're displayed in the notebook itself.

Can you use pandas in Jupyter notebook?

In JupyterLab, create a new (Python 3) notebook: In the first cell of the notebook, you can import pandas and check the version with: Now you are ready to use pandas, and you can write your code in the next cells.


1 Answers

You can simply filter the dataframe by the number of days and then plot the respective boxplot.

numer_of_days = 42
df_filtered= df.loc[df['days'] < numer_of_days]  # use operators like ==, >=, <, etc.
df_filtered[["category", "value"]].boxplot( by="category", return_type='axes')

enter image description here


In order to get a dropdown field, you can use the ipywidgets.interact() function, to which you provide a function that plots the dataframe for that specific day. (In the following I restricted the number of days to 12, such that a dropdown actually makes sense for selecting a single day out of those.)
import numpy as np
import pandas as pd
from ipywidgets import interact
%matplotlib notebook

categories=('A','B','C')

data = {
            'days':      np.random.randint(12, size=100), 
            'category':  np.random.choice(categories, 100),
            'value':     100.0 * np.random.random_sample(100)
       }

df = pd.DataFrame(data)

def select_days(number_of_days):
    df_filtered= df.loc[df['days'] == int(number_of_days)] 
    ax = df_filtered[["category", "value"]].boxplot( by="category", return_type='axes')
    ax["value"].set_title("Day " + number_of_days)
    print df_filtered

days = [str(day) for day in np.arange(12)]

interact(select_days, number_of_days=days)

enter image description here

like image 61
ImportanceOfBeingErnest Avatar answered Oct 26 '22 23:10

ImportanceOfBeingErnest