Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot a time series array, with confidence intervals displayed, in python?

I have some time series which slowly increases, but over a short period of time they are very wavy. For example, the time series could look like:

[10 + np.random.rand() for i in range(100)] + [12 + np.random.rand() for i in range(100)] + [14 + np.random.rand() for i in range(100)] 

I would like to plot the time series with a focus on the general trend, not on the small waves. Is there a way to plot the mean over a period of time surrounded with a stripe indicating the waves (the stripe should represent the confidence interval, where the data point could be in that moment)?

A simple plot would look like this:

enter image description here

The plot which I would like, with confidence intervals would look like this:

enter image description here

Is there an elegant way to do it in Python?

like image 871
Ștefan Avatar asked May 03 '18 17:05

Ștefan


People also ask

How do you find the confidence interval of an array in Python?

Create a new sample based on our dataset, with replacement and with the same number of points. Calculate the mean value and store it in an array or list. Repeat the process many times (e.g. 1000) On the list of the mean values, calculate 2.5th percentile and 97.5th percentile (if you want a 95% confidence interval)


2 Answers

You could use pandas function rolling(n) to generate the mean and standard deviation values over n consecutive points.

For the shade of the confidence intervals (represented by the space between standard deviations) you can use the function fill_between() from matplotlib.pyplot. For more information you could take a look over here, from which the following code is inspired.

import numpy             as np
import pandas            as pd
import matplotlib.pyplot as plt

#Declare the array containing the series you want to plot. 
#For example:
time_series_array = np.sin(np.linspace(-np.pi, np.pi, 400)) + np.random.rand((400))
n_steps           = 15 #number of rolling steps for the mean/std.

#Compute curves of interest:
time_series_df = pd.DataFrame(time_series_array)
smooth_path    = time_series_df.rolling(n_steps).mean()
path_deviation = 2 * time_series_df.rolling(n_steps).std()

under_line     = (smooth_path-path_deviation)[0]
over_line      = (smooth_path+path_deviation)[0]

#Plotting:
plt.plot(smooth_path, linewidth=2) #mean curve.
plt.fill_between(path_deviation.index, under_line, over_line, color='b', alpha=.1) #std curves.

With the above code you obtain something like this: enter image description here

like image 59
Ștefan Avatar answered Sep 17 '22 09:09

Ștefan


Looks like, you're doubling the std twice. I guess it should be like this:

time_series_df = pd.DataFrame(time_series_array)
smooth_path = time_series_df.rolling(20).mean()
path_deviation = time_series_df.rolling(20).std()
plt.plot(smooth_path, linewidth=2)
plt.fill_between(path_deviation.index, (smooth_path-2*path_deviation)[0], (smooth_path+2*path_deviation)[0], color='b', alpha=.1)
like image 33
flrndttrch Avatar answered Sep 19 '22 09:09

flrndttrch