Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib slow with large data sets, how to enable decimation?

I use matplotlib for a signal processing application and I noticed that it chokes on large data sets. This is something that I really need to improve to make it a usable application.

What I'm looking for is a way to let matplotlib decimate my data. Is there a setting, property or other simple way to enable that? Any suggestion of how to implement this are welcome.

Some code:

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
plt.plot(np.random.random_sample(n))
plt.show()

Some background information

I used to work on a large C++ application where we needed to plot large datasets and to solve this problem we used to take advantage of the structure of the data as follows:

In most cases, if we want a line plot then the data is ordered and often even equidistantial. If it is equidistantial, then you can calculate the start and end index in the data array directly from the zoom rectangle and the inverse axis transformation. If it is ordered but not equidistantial a binary search can be used.

Next the zoomed slice is decimated, and because the data is ordered we can simply iterate a block of points that fall inside one pixel. And for each block the mean, maximum and minimum is calculated. Instead of one pixel, we then draw a bar in the plot.

For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly the mean with a different color.

To avoid aliasing the plot is oversampled with a factor of two.

In case it is a scatter plot, the data can be made ordered by sorting, because the sequence of plotting is not important.

The nice thing of this simple recipe is that the more you zoom in the faster it becomes. In my experience, as long as the data fits in memory the plots stays very responsive. For instance, 20 plots of timehistory data with 10 million points should be no problem.

like image 223
Luke Avatar asked Nov 03 '13 12:11

Luke


1 Answers

It seems like you just need to decimate the data before you plot it

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
X=np.random.random_sample(n)
i=10*array(range(n/10))
plt.plot(X[i])
plt.show()
like image 96
Chris Flesher Avatar answered Nov 14 '22 22:11

Chris Flesher