I'd like to select some points on a plot (e.g. from box_select
or lasso_select
) and retrieve them in a Jupyter notebook for further data exploration. How can I do that?
For instance, in the code below, how to export the selection from Bokeh to the notebook? If I need a Bokeh server, this is fine too (I saw in the docs that I could add "two-way communication" with a server but did not manage to adapt the example to reach my goal).
from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource
output_notebook()
x = [random() for x in range(1000)]
y = [random() for y in range(1000)]
s = ColumnDataSource(data=dict(x=x, y=y))
fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)
show(fig)
# Select on the plot
# Get selection in a ColumnDataSource, or index list, or pandas object, or etc.?
Notes
Standalone Bokeh content (i.e. that does not use a Bokeh server) can be embedded directly in classic Jupyter notebooks as well as in JupyterLab.
Displaying Bokeh figure in Jupyter notebook is very similar to the above. The only change you need to make is to import output_notebook instead of output_file from bokeh. plotting module. Enter the code in a notebook cell and run it.
To select some points on a plot and retrieve them in a Jupyter notebook, you can use a CustomJS callback.
Within the CustomJS callback javascript code, you can access the Jupyter notebook kernel using IPython.notebook.kernel
. Then, you can use kernal.execute(python_code)
to run Python code and (for example) export data from the javascript call to the Jupyter notebook.
So, a bokeh server is not necessary for two-way communication between the bokeh plot and the Jupyter notebook.
Below, I have extended your example code to include a CustomJS callback that triggers on a selection geometry event in the figure. Whenever a selection is made, the callback runs and exports the indices of the selected data points to a variable within the Jupyter notebook called selected_indices
.
To obtain a ColumnDataSource
that contains the selected data points, the selected_indices
tuple is looped through to create lists of the selected x and y values, which are then passed to a ColumnDataSource
constructor.
from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource
from bokeh.models.callbacks import CustomJS
output_notebook()
x = [random() for x in range(1000)]
y = [random() for y in range(1000)]
s = ColumnDataSource(data=dict(x=x, y=y))
fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)
# make a custom javascript callback that exports the indices of the selected points to the Jupyter notebook
callback = CustomJS(args=dict(s=s),
code="""
console.log('Running CustomJS callback now.');
var indices = s.selected.indices;
var kernel = IPython.notebook.kernel;
kernel.execute("selected_indices = " + indices)
""")
# set the callback to run when a selection geometry event occurs in the figure
fig.js_on_event('selectiongeometry', callback)
show(fig)
# make a selection using a selection tool
# inspect the selected indices
selected_indices
# use the indices to create lists of the selected values
x_selected, y_selected = [], []
for indice in selected_indices:
x_val = s.data['x'][indice]
y_val = s.data['y'][indice]
x_selected.append(x_val)
y_selected.append(y_val)
# make a column data souce containing the selected values
selected = ColumnDataSource(data=dict(x=x_selected, y=y_selected))
# inspect the selected data
selected.data
If you have a bokeh server running, you can access the selection indices of a datasource via datasource.selection.indices
. The following is an example how you would do this (modified from the official Embed a Bokeh Server Into Jupyter example):
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import show, output_notebook
from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature
output_notebook()
df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)
def bkapp(doc):
plot = figure(x_axis_type='datetime', y_range=(0, 25), tools="lasso_select",
y_axis_label='Temperature (Celsius)',
title="Sea Surface Temperature at 43.18, -70.43")
plot.circle('time', 'temperature', source=source)
doc.add_root( plot)
show(bkapp)
After you selected something, you could get the selected data as following:
selected_data = df.iloc[source.selected.indices]
print(selected_data)
Which should show you the selected values.
While out of scope for this question, note that there is a disconnect between jupyter notebooks and the interactive nature of bokeh apps: This solution introduces state which is not saved by the jupyter notebook, so restarting it and executing all cells does not give the same results. One way to tackle this would be to persist the selection with pickle:
df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)
if os.path.isfile("selection.pickle"):
with open("selection.pickle", mode="rb") as f:
source.selected.indices = pickle.load(f)
... # interactive part
with open("selection.pickle", mode="wb") as f:
pickle.dump(source.selected.indices, f)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With