I plotted a box plot in Bokeh and another in matplotlib. Plotting in Bokeh was about 100 times slower for the same data. Why does Bokeh take so long? Here is the code, I ran this in Jupyter notebook:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from bokeh.charts import BoxPlot, output_notebook, show
from time import time
%matplotlib inline
# Generate data
N = 100000
x1 = 2 + np.random.randn(N)
y1 = ['a'] * N
x2 = -2 + np.random.randn(N)
y2 = ['b'] * N
X = list(x1) + list(x2)
Y = y1 + y2
data = pd.DataFrame()
data['Vals'] = X
data['Class'] = Y
df = data.apply(np.random.permutation)
# Time the bokeh plot
start_time = time()
p = BoxPlot(data, values='Vals', label='Class',\
title="MPG Summary (grouped by CYL, ORIGIN)")
output_notebook()
show(p)
end_time = time()
print("Total time taken for Bokeh is {0}".format(end_time - start_time))
# time the matplotlib plot
start_time = time()
data.boxplot(column='Vals', by='Class', sym = 'o')
end_time = time()
print("Total time taken for matplotlib is {0}".format(end_time - start_time))
The print statements produce the following outputs:
Total time taken for Bokeh is 11.8056321144104
Total time taken for matplotlib is 0.1586170196533203
Bokeh vs. While Bokeh and Matplotlib both help you plot data, these two libraries are different tools for different purposes. If your focus is on website interactivity, then Bokeh is the better choice. Matplotlib, on the other hand, provides Python visualizations that integrate well with Jupyter Notebook.
To achieve bokeh in an image, you need to use a fast lens—the faster the better. You'll want to use a lens with at least an f/2.8 aperture, with faster apertures of f/2, f/1.8 or f/1.4 being ideal.
In this comparison of Bokeh vs Plotly, we can't make out a decisive choice between the two. Though Plotly is good for plotting graphs and visualizing data for insights, it is not good for making dashboards. To make dashboards we can use bokeh and can have very fast dashboards and interactivity.
Matplotlib, seaborn, ggplot, and Pandas¶Uses bokeh to display a Matplotlib Figure. You can store a bokeh plot in a standalone HTML file, as a document in a Bokeh plot server, or embedded directly into an IPython Notebook output cell. Parameters: fig (matplotlib.
There is some problem specifically with bokeh.charts.BoxPlot
. Unfortunately, bokeh.charts
does not have a maintainer at the moment, so I can't state anything about when it might get fixed or improved.
However, in case it is useful to you, I will demonstrate below that you can use the well-established and stable bokeh.plotting
API to do things "by hand", and then the time is comparable to if not faster than MPL:
from time import time
import pandas as pd
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()
# Generate data
N = 100000
x1 = 2 + np.random.randn(N)
y1 = ['a'] * N
x2 = -2 + np.random.randn(N)
y2 = ['b'] * N
X = list(x1) + list(x2)
Y = y1 + y2
df = pd.DataFrame()
df['Vals'] = X
df['Class'] = Y
# Time the bokeh plot
start_time = time()
# find the quartiles and IQR for each category
groups = df.groupby('Class')
q1 = groups.quantile(q=0.25)
q2 = groups.quantile(q=0.5)
q3 = groups.quantile(q=0.75)
iqr = q3 - q1
upper = q3 + 1.5*iqr
lower = q1 - 1.5*iqr
cats = ['a', 'b']
p = figure(x_range=cats)
# if no outliers, shrink lengths of stems to be no longer than the minimums or maximums
qmin = groups.quantile(q=0.00)
qmax = groups.quantile(q=1.00)
upper.score = [min([x,y]) for (x,y) in zip(list(qmax.loc[:,'Vals']),upper.Vals)]
lower.score = [max([x,y]) for (x,y) in zip(list(qmin.loc[:,'Vals']),lower.Vals)]
# stems
p.segment(cats, upper.Vals, cats, q3.Vals, line_color="black")
p.segment(cats, lower.Vals, cats, q1.Vals, line_color="black")
# boxes
p.vbar(cats, 0.7, q2.Vals, q3.Vals, fill_color="#E08E79", line_color="black")
p.vbar(cats, 0.7, q1.Vals, q2.Vals, fill_color="#3B8686", line_color="black")
# whiskers (almost-0 height rects simpler than segments)
p.rect(cats, lower.Vals, 0.2, 0.01, line_color="black")
p.rect(cats, upper.Vals, 0.2, 0.01, line_color="black")
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = "white"
p.grid.grid_line_width = 2
p.xaxis.major_label_text_font_size="12pt"
show(p)
end_time = time()
print("Total time taken for Bokeh is {0}".format(end_time - start_time))
It's a chunk of code but it would be simple enough to wrap up into a re-usable function. For me, the above resulted in:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With