Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best method for using Datashader to plot data from a NumPy array?

In following the Datashader example notebook demonstrating lines, the input is a Pandas DataFrame (though it seems a Dask DataFrame would work as well). My data is in a NumPy array. Can I use Datashader to plot lines from NumPy arrays without first putting them into a DataFrame?

The documentation for line glyph seems to indicate this is possible but I did not find an example. The example notebook I linked to uses Canvas.line which I did not find in the documentation.

like image 595
Steven C. Howell Avatar asked Feb 10 '17 14:02

Steven C. Howell


People also ask

How does Datashader work?

Datashader is an open-source Python library for analyzing and visualizing large datasets. Specifically, Datashader is designed to “rasterize” or “aggregate” datasets into regular grids that can be analyzed further or viewed as images, making it simple and quick to see the properties and patterns of your data.

Which is faster NumPy or pandas?

Pandas has a better performance when a number of rows is 500K or more. Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays. Indexing of numpy Arrays is very fast.

How do you visualize a large data in Python?

datashader is a great library to visualize larger datasets. The main improvement comes from the rasterization process: matplotlib will create a circle for every data point and then, when you're displaying your data, it will have to figure out which pixels on your canvas each point occupies.


1 Answers

I did not find a way to plot data in a NumPy array without first putting it into a DataFrame. How to do this was not especially intuitive, it seems Datashader requires the column labels to be non-numeric strings, so they can be called using the df.col_label syntax (rather than the df[col_label] syntax, perhaps there is a good reason for this though).

With the current system I had to do the following to get the NumPy array into a DataFrame with column labels Datashader would accept.

df = pd.DataFrame(data=data.T)
data_cols = ['c{}'.format(c) for c in df.columns]
df.columns = data_cols
df['x'] = x_values

y_range = data.min(), data.max()
x_range = x_values[0], x_values[-1]

canvas = datashader.Canvas(x_range=x_range, y_range=y_range, 
                           plot_height=300, plot_width=900)
aggs = collections.OrderedDict((c, canvas.line(df, 'q', c)) for c in data_cols)

merged = xarray.concat(saxs_aggs.values(), dim=pd.Index(cols, name='cols'))
saxs_img = datashader.transfer_functions.shade(merged.sum(dim='cols'), 
                                               how='eq_hist')

Note that the data_cols variable was important to use, rather than simply df.columns, because it had to exclude the x column (not initially intuitive).

Here is an example of the resulting with axes added using bokeh. enter image description here

like image 143
Steven C. Howell Avatar answered Sep 24 '22 20:09

Steven C. Howell