Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is bokeh so much slower than matplotlib

I plotted a box plot in Bokeh and another in matplotlib. Plotting in Bokeh was about 100 times slower for the same data. Why does Bokeh take so long? Here is the code, I ran this in Jupyter notebook:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib as mpl

from bokeh.charts import BoxPlot, output_notebook, show

from time import time

%matplotlib inline


# Generate data
N = 100000
x1 = 2 + np.random.randn(N)
y1 = ['a'] * N

x2 = -2 + np.random.randn(N)
y2 = ['b'] * N

X = list(x1) + list(x2)
Y = y1 + y2

data = pd.DataFrame()
data['Vals'] = X
data['Class'] = Y

df = data.apply(np.random.permutation)


# Time the bokeh plot
start_time = time()

p = BoxPlot(data, values='Vals', label='Class',\
            title="MPG Summary (grouped by CYL, ORIGIN)")
output_notebook()
show(p)

end_time = time()
print("Total time taken for Bokeh is {0}".format(end_time - start_time))


# time the matplotlib plot
start_time = time()

data.boxplot(column='Vals', by='Class', sym = 'o')

end_time = time()
print("Total time taken for matplotlib is {0}".format(end_time - start_time))

The print statements produce the following outputs:

Total time taken for Bokeh is 11.8056321144104

Total time taken for matplotlib is 0.1586170196533203

like image 476
Shishir Pandey Avatar asked Mar 09 '17 15:03

Shishir Pandey


People also ask

Is bokeh better than Matplotlib?

Bokeh vs. While Bokeh and Matplotlib both help you plot data, these two libraries are different tools for different purposes. If your focus is on website interactivity, then Bokeh is the better choice. Matplotlib, on the other hand, provides Python visualizations that integrate well with Jupyter Notebook.

How do I make bokeh faster?

To achieve bokeh in an image, you need to use a fast lens—the faster the better. You'll want to use a lens with at least an f/2.8 aperture, with faster apertures of f/2, f/1.8 or f/1.4 being ideal.

Is bokeh better than Plotly?

In this comparison of Bokeh vs Plotly, we can't make out a decisive choice between the two. Though Plotly is good for plotting graphs and visualizing data for insights, it is not good for making dashboards. To make dashboards we can use bokeh and can have very fast dashboards and interactivity.

Does Bokeh use Matplotlib?

Matplotlib, seaborn, ggplot, and Pandas¶Uses bokeh to display a Matplotlib Figure. You can store a bokeh plot in a standalone HTML file, as a document in a Bokeh plot server, or embedded directly into an IPython Notebook output cell. Parameters: fig (matplotlib.


1 Answers

There is some problem specifically with bokeh.charts.BoxPlot. Unfortunately, bokeh.charts does not have a maintainer at the moment, so I can't state anything about when it might get fixed or improved.

However, in case it is useful to you, I will demonstrate below that you can use the well-established and stable bokeh.plotting API to do things "by hand", and then the time is comparable to if not faster than MPL:

from time import time

import pandas as pd
import numpy as np

from bokeh.io import output_notebook, show
from bokeh.plotting import figure

output_notebook()

# Generate data
N = 100000
x1 = 2 + np.random.randn(N)
y1 = ['a'] * N

x2 = -2 + np.random.randn(N)
y2 = ['b'] * N

X = list(x1) + list(x2)
Y = y1 + y2

df = pd.DataFrame()
df['Vals'] = X
df['Class'] = Y

# Time the bokeh plot
start_time = time()

# find the quartiles and IQR for each category
groups = df.groupby('Class')
q1 = groups.quantile(q=0.25)
q2 = groups.quantile(q=0.5)
q3 = groups.quantile(q=0.75)
iqr = q3 - q1
upper = q3 + 1.5*iqr
lower = q1 - 1.5*iqr

cats = ['a', 'b']

p = figure(x_range=cats)

# if no outliers, shrink lengths of stems to be no longer than the minimums or maximums
qmin = groups.quantile(q=0.00)
qmax = groups.quantile(q=1.00)
upper.score = [min([x,y]) for (x,y) in zip(list(qmax.loc[:,'Vals']),upper.Vals)]
lower.score = [max([x,y]) for (x,y) in zip(list(qmin.loc[:,'Vals']),lower.Vals)]

# stems
p.segment(cats, upper.Vals, cats, q3.Vals, line_color="black")
p.segment(cats, lower.Vals, cats, q1.Vals, line_color="black")

# boxes
p.vbar(cats, 0.7, q2.Vals, q3.Vals, fill_color="#E08E79", line_color="black")
p.vbar(cats, 0.7, q1.Vals, q2.Vals, fill_color="#3B8686", line_color="black")

# whiskers (almost-0 height rects simpler than segments)
p.rect(cats, lower.Vals, 0.2, 0.01, line_color="black")
p.rect(cats, upper.Vals, 0.2, 0.01, line_color="black")

p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = "white"
p.grid.grid_line_width = 2
p.xaxis.major_label_text_font_size="12pt"

show(p)

end_time = time()
print("Total time taken for Bokeh is {0}".format(end_time - start_time))

It's a chunk of code but it would be simple enough to wrap up into a re-usable function. For me, the above resulted in:

enter image description here

like image 191
bigreddot Avatar answered Oct 16 '22 12:10

bigreddot