I am plotting timeseries data using Matplotlib and some of the data is missing in the sequence. Matplotlib implicitly joins the last contiguous data point to the next one. But in case data is missing, the plot looks ugly. The following is the plot obtained.
It can be seen that near the April 30th marker, data is missing and Matplotlib joins the points. Also the following image is the scatter plot of the data. Scatter plot covers up this fault, but then contiguous data points won't be joint in this case. Moreover, scatter plot is very slow given the huge number of data points involved.
What is the recommended solution for such problems.
If you can identify where the break points should be you can either:
np.nan
in the data in the gapsSee for example Plot periodic trajectories.
You can get the same effect of scatter
(if you don't want to scale the size or color of each point independently) with
ax.plot(x, y, linestyle='none', marker='o')
As the previous answer says, you should insert NaNs where there is no data. This answer is specific to Pandas, and explains how this can be achieved easily. Either :
Series.resample()
orSeries.reindex()
The simplest method to use is resample()
. This is the most concise way for regularly spaced data. So in your example above, if you have e.g. 5 minute data, just do data.resample("5 min")
. This will return your data set with 'NaT' (time equivalent of NaN) in the missing values.
The only case where this doesn't work too well is when your samples are not regularly-spaced.
The alternative is reindex()
, which also works for ordered (but non-time-series) data. So for example, if you had a data set indexed with integers from 0 .. 100, but with a few missing samples, you could do data.reindex([0:100])
. You can also replicate the behaviour of resample
with reindex
, by passing in a pandas.date_range()
function as an argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With