Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I extract data from a Bokeh ColumnDatasource

I was trying to avoid using a ColumnDataSource and instead of that I was passing pandas dataframe columns directly to Bokeh plots.

Soon though I had to implement a HoverTool which requires to have the data in a ColumnDataSource. So, I started using ColumnDataSource.

Now, I was creating a box annotation and I had to use the maximum value of a certain column from my data to define the top border of the box.

I can do that easily using pandas:

low_box = BoxAnnotation(
    top=flowers['petal_width'][flowers['species']=='setosa'].max(),
    fill_alpha=0.1, fill_color='red')

But I can't figure out how to extract the maximum from a ColumnDataSource.

Is there a way to extract a maximum value from it, or is my approach all wrong in the first place?

like image 627
multigoodverse Avatar asked Aug 01 '16 07:08

multigoodverse


People also ask

What is the use of ColumnDataSource in bokeh?

The ColumnDataSource (CDS) is the core of most Bokeh plots. It provides the data to the glyphs of your plot. When you pass sequences like Python lists or NumPy arrays to a Bokeh renderer, Bokeh automatically creates a ColumnDataSource with this data for you.

Does bokeh work with pandas?

Using a pandas DataFrameIf you use a pandas DataFrame , the resulting ColumnDataSource in Bokeh will have columns that correspond to the columns of the DataFrame . The naming of the columns follows these rules: If the DataFrame has a named index column, the ColumnDataSource will also have a column with this name.


2 Answers

A ColumnDataSource object has an attribute data which will return the python dictionary used to create the object in the first place.

from bokeh.plotting import ColumnDataSource

# define ColumnDataSource
source = ColumnDataSource(
    data=dict(
        x=[1, 2, 3, 4, 5],
        y=[2, 5, 8, 2, 7],
        desc=['A', 'b', 'C', 'd', 'E'],
    )
)

# find max for variable 'x' from 'source'
print( max( source.data['x'] ))
like image 56
benten Avatar answered Oct 23 '22 20:10

benten


If the source input is a Pandas DataFrame, you can use the Standard method:

source = ColumnDataSource(
    data= pd.DataFrame( dict(
        x=[1, 2, 3, 4, 5],
        y=[2, 5, 8, 2, 7],
        desc=['A', 'b', 'C', 'd', 'E'],
    ))
)
print( source.data['x'].max() )
like image 34
InLaw Avatar answered Oct 23 '22 22:10

InLaw