I have data in a pandas DataFrame, and I want to create an interactive boxplot that allows me to select the number of days, whilst plotting a boxplot for the values in each of the categories in the column 'category'
This is what my code/data looks like so far:
import numpy as np
import pandas as pd
categories=('A','B','C')
data = {
'days': np.random.randint(120, size=100),
'category': np.random.choice(categories, 100),
'value': 100.0 * np.random.random_sample(100)
}
df = pd.DataFrame(data)
print(df)
category days value
0 A 4 77.383981
1 A 31 63.011934
2 A 5 1.165061
3 C 59 23.588979
4 A 57 14.906734
5 C 106 33.366634
6 A 29 90.658570
7 B 25 16.137490
8 A 118 34.526302
9 C 76 4.111797
10 A 11 30.195917
.. ... ... ...
90 A 64 37.529774
91 A 76 3.771360
92 C 112 93.948775
93 C 14 34.855189
94 B 64 83.106007
95 A 10 78.346319
96 B 86 66.645889
97 A 46 12.969012
98 C 29 57.925427
99 A 59 34.526146
[100 rows x 3 columns]
I want to create a boxplot of the values for each of the categories (for a selected/specified number of days), with the different categories being plotted along the X-axis.
How do I do that using pandas (or matplotlib)?
Usually, displaying plots involves using the show() function from PyPlot. With Jupyter notebooks, this isn't necessary as the plots are displayed after running the cells containing the code that generates them. These plots are by default, displayed inline, which means, they're displayed in the notebook itself.
In JupyterLab, create a new (Python 3) notebook: In the first cell of the notebook, you can import pandas and check the version with: Now you are ready to use pandas, and you can write your code in the next cells.
You can simply filter the dataframe by the number of days and then plot the respective boxplot.
numer_of_days = 42
df_filtered= df.loc[df['days'] < numer_of_days] # use operators like ==, >=, <, etc.
df_filtered[["category", "value"]].boxplot( by="category", return_type='axes')
ipywidgets.interact()
function, to which you provide a function that plots the dataframe for that specific day.
(In the following I restricted the number of days to 12, such that a dropdown actually makes sense for selecting a single day out of those.)
import numpy as np
import pandas as pd
from ipywidgets import interact
%matplotlib notebook
categories=('A','B','C')
data = {
'days': np.random.randint(12, size=100),
'category': np.random.choice(categories, 100),
'value': 100.0 * np.random.random_sample(100)
}
df = pd.DataFrame(data)
def select_days(number_of_days):
df_filtered= df.loc[df['days'] == int(number_of_days)]
ax = df_filtered[["category", "value"]].boxplot( by="category", return_type='axes')
ax["value"].set_title("Day " + number_of_days)
print df_filtered
days = [str(day) for day in np.arange(12)]
interact(select_days, number_of_days=days)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With