I have a large dataset that I would like to plot in an IPython notebook.
I read the ~0.5GB .csv file into a Pandas DataFrame
using read_csv
, this takes about two minutes. Then I try to plot this data.
data = pd.read_csv('large.csv')
output_notebook()
p1 = figure()
p1.circle(data.index, data['myDataset'])
show(p1)
My browser spins and does not show me any plots. I have tried the following:
output_file()
instead of output_notebook()
ColumnSource
object as the source
argument to circle()
Bokeh claims on its website to offer "high-performance interactivity over very large or streaming datasets". How do I visualize these large datasets without my computer grinding to a halt?
The question is too broad to offer any specific code suggestions. I would be curious what the size of the downsampling you tried was. The default HTML Canvas for Bokeh can definitely accommodate tens of thousands of circles. There are a few options:
for simple scatters and lines of hundreds of thousands of points, there is a WebGL backend that may be useful.
http://docs.bokeh.org/en/latest/docs/user_guide/webgl.html
using the Bokeh Server, create a Bokeh app to downsample the data before rendering it. There are some app examples here:
https://github.com/bokeh/bokeh/tree/master/examples/app
The DataShader library can be used to perform downsampling of large data sets (hundreds of millions to billions of points), and integrates very well with Bokeh.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With