Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the attributes of the selected item in a GeoJSONDataSource

Tags:

python

bokeh

I want to link a plot containing patches (from a GeoJSONDataSource) with a line chart but i'm having trouble getting the attributes of the selected patch.

Its basically a plot showing polygons, and when a polygon is selected, i want to update the line chart with a timeseries of data for that polygon. The line chart is driven by a normal ColumnDataSource.

I can get the indices of the selected patch by adding a callback combined with geo_source.selected['1d']['indices']. But how do i get the data/attributes which correspond to that index? I need to get a 'key' in the attributes which i can then use to update the line chart.

The GeoJSONDataSource has no data attribute in which i can lookup the data itself. Bokeh can use the attributes for things like coloring/tooltips etc, so i assume there must be a way to get these out of the GeoJSONDataSource, i cant find it unfortunately.

edit:

Here is working toy example showing what i've got so far.

import pandas as pd
import numpy as np

from bokeh import events
from bokeh.models import (Select, Column, Row, ColumnDataSource, HoverTool, 
                          Range1d, LinearAxis, GeoJSONDataSource)
from bokeh.plotting import figure
from bokeh.io import curdoc

import os
import datetime
from collections import OrderedDict

def make_plot(src):
    # function to create the line chart

    p = figure(width=500, height=200, x_axis_type='datetime', title='Some parameter',
               tools=['xwheel_zoom', 'xpan'], logo=None, toolbar_location='below', toolbar_sticky=False)
    
    p.circle('index', 'var1', color='black', fill_alpha=0.2, size=10, source=src)

    return p

def make_geo_plot(src):
    # function to create the spatial plot with polygons
    
    p = figure(width=300, height=300, title="Select area", tools=['tap', 'pan', 'box_zoom', 'wheel_zoom','reset'], logo=None)

    p.patches('xs', 'ys', fill_alpha=0.2, fill_color='black',
              line_color='black', line_width=0.5, source=src)
              
    p.on_event(events.SelectionGeometry, update_plot_from_geo)

    return p

def update_plot_from_geo(event):
    # update the line chart based on the selected polygon

    selected = geo_source.selected['1d']['indices']
    
    if (len(selected) > 0):
        first = selected[0]
        print(geo_source.selected['1d']['indices'])


def update_plot(attrname, old, new):
    # Callback for the dropdown menu which updates the line chart
    new_src = get_source(df, area_select.value)    
    src.data.update(new_src.data)
  
def get_source(df, fieldid):
    # function to get a subset of the multi-hierarchical DataFrame
    
    # slice 'out' the selected area
    dfsub = df.xs(fieldid, axis=1, level=0)
    src = ColumnDataSource(dfsub)
    
    return src

# example timeseries
n_points = 100
df = pd.DataFrame({('area_a','var1'): np.sin(np.linspace(0,5,n_points)) + np.random.rand(100)*0.1,
                   ('area_b','var1'): np.sin(np.linspace(0,2,n_points)) + np.random.rand(100)*0.1,
                   ('area_c','var1'): np.sin(np.linspace(0,3,n_points)) + np.random.rand(100)*0.1,
                   ('area_d','var1'): np.sin(np.linspace(0,4,n_points)) + np.random.rand(100)*0.1},
                  index=pd.DatetimeIndex(start='2017-01-01', freq='D', periods=100))

# example polygons
geojson = """{
"type":"FeatureCollection",
"crs":{"type":"name","properties":{"name":"urn:ogc:def:crs:OGC:1.3:CRS84"}},
"features":[
{"type":"Feature","properties":{"key":"area_a"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-108.8,42.7],[-104.5,42.0],[-108.3,39.3],[-108.8,42.7]]]]}},
{"type":"Feature","properties":{"key":"area_b"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-106.3,44.0],[-106.2,42.6],[-103.3,42.6],[-103.4,44.0],[-106.3,44.0]]]]}},
{"type":"Feature","properties":{"key":"area_d"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-104.3,41.0],[-101.5,41.0],[-102.9,37.8],[-104.3,41.0]]]]}},
{"type":"Feature","properties":{"key":"area_c"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-105.8,40.3],[-108.3,37.7],[-104.0,37.4],[-105.8,40.3]]]]}}
]
}"""

geo_source = GeoJSONDataSource(geojson=geojson)

# populate a drop down menu with the area's 
area_ids = sorted(df.columns.get_level_values(0).unique().values.tolist())
area_ids = [str(x) for x in area_ids]
area_select = Select(value=area_ids[0], title='Select area', options=area_ids)
area_select.on_change('value', update_plot)

src = get_source(df, area_select.value)

p = make_plot(src)
pgeo = make_geo_plot(geo_source)

# add to document
curdoc().add_root(Row(Column(area_select, p), pgeo))

Save the code in a .py file and load with bokeh serve example.py --show

enter image description here

like image 883
Rutger Kassies Avatar asked Nov 10 '17 14:11

Rutger Kassies


2 Answers

You should write a custom extension for the GeoJSONDataSource

Here is the coffeescript for GeoJSONDataSource https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/sources/geojson_data_source.coffee

I am not very good with custom extension. So I just completely copied GeoJSONDataSource and called it CustomGeo instead. And I just moved the 'data' from @internal to @define. Then bingo, you got yourself a GeoJSONDataSource with a 'data' attribute.

In the example below I did the callback using the 'key' list, but since you now have the data like this, you could write something to doublecheck that it corresponds to the appropriate polygon if you are worried about shuffling

import pandas as pd
import numpy as np

from bokeh.core.properties import Instance, Dict, JSON, Any

from bokeh import events
from bokeh.models import (Select, Column, Row, ColumnDataSource, HoverTool, 
                          Range1d, LinearAxis, GeoJSONDataSource, ColumnarDataSource)
from bokeh.plotting import figure
from bokeh.io import curdoc

import os
import datetime
from collections import OrderedDict

def make_plot(src):
    # function to create the line chart

    p = figure(width=500, height=200, x_axis_type='datetime', title='Some parameter',
               tools=['xwheel_zoom', 'xpan'], logo=None, toolbar_location='below', toolbar_sticky=False)

    p.circle('index', 'var1', color='black', fill_alpha=0.2, size=10, source=src)

    return p

def make_geo_plot(src):
    # function to create the spatial plot with polygons

    p = figure(width=300, height=300, title="Select area", tools=['tap', 'pan', 'box_zoom', 'wheel_zoom','reset'], logo=None)

    a=p.patches('xs', 'ys', fill_alpha=0.2, fill_color='black',
              line_color='black', line_width=0.5, source=src,name='poly')

    p.on_event(events.SelectionGeometry, update_plot_from_geo)

    return p

def update_plot_from_geo(event):
    # update the line chart based on the selected polygon

    try:
      selected = geo_source.selected['1d']['indices'][0]
    except IndexError:
      return

    print geo_source.data
    print geo_source.data['key'][selected]

    new_src = get_source(df,geo_source.data['key'][selected])
    src.data.update(new_src.data)

def update_plot(attrname, old, new):
    # Callback for the dropdown menu which updates the line chart
    print area_select.value
    new_src = get_source(df, area_select.value)    
    src.data.update(new_src.data)

def get_source(df, fieldid):
    # function to get a subset of the multi-hierarchical DataFrame

    # slice 'out' the selected area
    dfsub = df.xs(fieldid, axis=1, level=0)
    src = ColumnDataSource(dfsub)

    return src

# example timeseries
n_points = 100
df = pd.DataFrame({('area_a','var1'): np.sin(np.linspace(0,5,n_points)) + np.random.rand(100)*0.1,
                   ('area_b','var1'): np.sin(np.linspace(0,2,n_points)) + np.random.rand(100)*0.1,
                   ('area_c','var1'): np.sin(np.linspace(0,3,n_points)) + np.random.rand(100)*0.1,
                   ('area_d','var1'): np.sin(np.linspace(0,4,n_points)) + np.random.rand(100)*0.1},
                  index=pd.DatetimeIndex(start='2017-01-01', freq='D', periods=100))

# example polygons
geojson = """{
"type":"FeatureCollection",
"crs":{"type":"name","properties":{"name":"urn:ogc:def:crs:OGC:1.3:CRS84"}},
"features":[
{"type":"Feature","properties":{"key":"area_a"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-108.8,42.7],[-104.5,42.0],[-108.3,39.3],[-108.8,42.7]]]]}},
{"type":"Feature","properties":{"key":"area_b"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-106.3,44.0],[-106.2,42.6],[-103.3,42.6],[-103.4,44.0],[-106.3,44.0]]]]}},
{"type":"Feature","properties":{"key":"area_d"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-104.3,41.0],[-101.5,41.0],[-102.9,37.8],[-104.3,41.0]]]]}},
{"type":"Feature","properties":{"key":"area_c"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-105.8,40.3],[-108.3,37.7],[-104.0,37.4],[-105.8,40.3]]]]}}
]
}"""

implementation = """
import {ColumnarDataSource} from "models/sources/columnar_data_source"
import {logger} from "core/logging"
import * as p from "core/properties"

export class CustomGeo extends ColumnarDataSource
  type: 'CustomGeo'

  @define {
    geojson: [ p.Any     ] # TODO (bev)
    data:    [ p.Any,   {} ]
  }

  initialize: (options) ->
    super(options)
    @_update_data()
    @connect(@properties.geojson.change, () => @_update_data())

  _update_data: () -> @data = @geojson_to_column_data()

  _get_new_list_array: (length) -> ([] for i in [0...length])

  _get_new_nan_array: (length) -> (NaN for i in [0...length])

  _flatten_function: (accumulator, currentItem) ->
    return accumulator.concat([[NaN, NaN, NaN]]).concat(currentItem)

  _add_properties: (item, data, i, item_count) ->
    for property of item.properties
      if !data.hasOwnProperty(property)
        data[property] = @_get_new_nan_array(item_count)
      data[property][i] = item.properties[property]

  _add_geometry: (geometry, data, i) ->

    switch geometry.type

      when "Point"
        coords = geometry.coordinates
        data.x[i] = coords[0]
        data.y[i] = coords[1]
        data.z[i] = coords[2] ? NaN

      when "LineString"
        coord_list = geometry.coordinates
        for coords, j in coord_list
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      when "Polygon"
        if geometry.coordinates.length > 1
          logger.warn('Bokeh does not support Polygons with holes in, only exterior ring used.')
        exterior_ring = geometry.coordinates[0]
        for coords, j in exterior_ring
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      when "MultiPoint"
        logger.warn('MultiPoint not supported in Bokeh')

      when "MultiLineString"
        flattened_coord_list = geometry.coordinates.reduce(@_flatten_function)
        for coords, j in flattened_coord_list
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      when "MultiPolygon"
        exterior_rings = []
        for polygon in geometry.coordinates
          if polygon.length > 1
            logger.warn('Bokeh does not support Polygons with holes in, only exterior ring used.')
          exterior_rings.push(polygon[0])

        flattened_coord_list = exterior_rings.reduce(@_flatten_function)
        for coords, j in flattened_coord_list
          data.xs[i][j] = coords[0]
          data.ys[i][j] = coords[1]
          data.zs[i][j] = coords[2] ? NaN

      else
        throw new Error('Invalid type ' + geometry.type)

  _get_items_length: (items) ->
    count = 0
    for item, i in items
      geometry = if item.type == 'Feature' then item.geometry else item
      if geometry.type == 'GeometryCollection'
        for g, j in geometry.geometries
          count += 1
      else
        count += 1
    return count

  geojson_to_column_data: () ->
    geojson = JSON.parse(@geojson)

    if geojson.type not in ['GeometryCollection', 'FeatureCollection']
      throw new Error('Bokeh only supports type GeometryCollection and FeatureCollection at top level')

    if geojson.type == 'GeometryCollection'
      if not geojson.geometries?
        throw new Error('No geometries found in GeometryCollection')
      if geojson.geometries.length == 0
        throw new Error('geojson.geometries must have one or more items')
      items = geojson.geometries

    if geojson.type == 'FeatureCollection'
      if not geojson.features?
        throw new Error('No features found in FeaturesCollection')
      if geojson.features.length == 0
        throw new Error('geojson.features must have one or more items')
      items = geojson.features

    item_count = @_get_items_length(items)

    data = {
      'x': @_get_new_nan_array(item_count),
      'y': @_get_new_nan_array(item_count),
      'z': @_get_new_nan_array(item_count),
      'xs': @_get_new_list_array(item_count),
      'ys': @_get_new_list_array(item_count),
      'zs': @_get_new_list_array(item_count)
    }

    arr_index = 0
    for item, i in items
      geometry = if item.type == 'Feature' then item.geometry else item

      if geometry.type == 'GeometryCollection'
        for g, j in geometry.geometries
          @_add_geometry(g, data, arr_index)
          if item.type == 'Feature'
            @_add_properties(item, data, arr_index, item_count)
          arr_index += 1
      else
        # Now populate based on Geometry type
        @_add_geometry(geometry, data, arr_index)
        if item.type == 'Feature'
          @_add_properties(item, data, arr_index, item_count)

        arr_index += 1

    return data

"""

class CustomGeo(ColumnarDataSource):
  __implementation__ = implementation

  geojson = JSON(help="""
  GeoJSON that contains features for plotting. Currently GeoJSONDataSource can
  only process a FeatureCollection or GeometryCollection.
  """)

  data = Dict(Any,Any,default={},help="wooo")

geo_source = CustomGeo(geojson=geojson)

# populate a drop down menu with the area's 
area_ids = sorted(df.columns.get_level_values(0).unique().values.tolist())
area_ids = [str(x) for x in area_ids]
area_select = Select(value=area_ids[0], title='Select area', options=area_ids)
area_select.on_change('value', update_plot)

src = get_source(df, area_select.value)

p = make_plot(src)
pgeo = make_geo_plot(geo_source)

# add to document
curdoc().add_root(Row(Column(area_select, p), pgeo))
like image 126
Seb Avatar answered Nov 14 '22 11:11

Seb


The geojson data that you pass to GeoJSONDataSource is stored in its geojson property -- as a string. My suggestion isn't particularly elegant: you can just parse the json string using the built-in json module. Here's a working version of update_plot_from_geo that updates the line plot based on the selected polygon:

def update_plot_from_geo(event):
    # update the line chart based on the selected polygon

    indices = geo_source.selected['1d']['indices']

    if indices:
        parsed_geojson = json.loads(geo_source.geojson)
        features = parsed_geojson['features']
        series_key = features[indices[0]]['properties']['key']
        new_source = get_source(df, series_key)
        src.data.update(new_source.data)

You'll also need to import json at the top.

I'm a little surprised there's not an obvious way to get the parsed json data. The GeoJSONDataSource documentation indicates the existence of the geojson attribute, but says it's a JSON object. The JSON documentation seems to hint that you should be able to do something like src.geojson.parse. But the type of geojson is just str. Upon closer inspection, it appears that the docs are using "JSON" ambiguously, referring to the Bokeh JSON class in some cases, and to the built-in JavaScript JSON object in others.

So at the moment, I don't believe there's a better way to get at this data.

like image 32
senderle Avatar answered Nov 14 '22 11:11

senderle