Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib remove interpolation for missing data

I am plotting timeseries data using Matplotlib and some of the data is missing in the sequence. Matplotlib implicitly joins the last contiguous data point to the next one. But in case data is missing, the plot looks ugly. The following is the plot obtained. enter image description here

It can be seen that near the April 30th marker, data is missing and Matplotlib joins the points. Also the following image is the scatter plot of the data. Scatter plot covers up this fault, but then contiguous data points won't be joint in this case. Moreover, scatter plot is very slow given the huge number of data points involved. enter image description here

What is the recommended solution for such problems.

like image 683
Nipun Batra Avatar asked May 09 '13 04:05

Nipun Batra


2 Answers

If you can identify where the break points should be you can either:

  1. break the data and plot each 'section' by hand
  2. insert np.nan in the data in the gaps

See for example Plot periodic trajectories.

You can get the same effect of scatter (if you don't want to scale the size or color of each point independently) with

ax.plot(x, y, linestyle='none', marker='o')
like image 143
tacaswell Avatar answered Sep 27 '22 16:09

tacaswell


As the previous answer says, you should insert NaNs where there is no data. This answer is specific to Pandas, and explains how this can be achieved easily. Either :

  • Series.resample() or
  • Series.reindex()

The simplest method to use is resample(). This is the most concise way for regularly spaced data. So in your example above, if you have e.g. 5 minute data, just do data.resample("5 min"). This will return your data set with 'NaT' (time equivalent of NaN) in the missing values.

The only case where this doesn't work too well is when your samples are not regularly-spaced.

The alternative is reindex(), which also works for ordered (but non-time-series) data. So for example, if you had a data set indexed with integers from 0 .. 100, but with a few missing samples, you could do data.reindex([0:100]). You can also replicate the behaviour of resample with reindex, by passing in a pandas.date_range() function as an argument.

like image 30
Luciano Avatar answered Sep 27 '22 17:09

Luciano