I have some time series which slowly increases, but over a short period of time they are very wavy. For example, the time series could look like:
[10 + np.random.rand() for i in range(100)] + [12 + np.random.rand() for i in range(100)] + [14 + np.random.rand() for i in range(100)]
I would like to plot the time series with a focus on the general trend, not on the small waves. Is there a way to plot the mean over a period of time surrounded with a stripe indicating the waves (the stripe should represent the confidence interval, where the data point could be in that moment)?
A simple plot would look like this:
The plot which I would like, with confidence intervals would look like this:
Is there an elegant way to do it in Python?
Create a new sample based on our dataset, with replacement and with the same number of points. Calculate the mean value and store it in an array or list. Repeat the process many times (e.g. 1000) On the list of the mean values, calculate 2.5th percentile and 97.5th percentile (if you want a 95% confidence interval)
You could use pandas
function rolling(n)
to generate the mean and standard deviation values over n
consecutive points.
For the shade of the confidence intervals (represented by the space between standard deviations) you can use the function fill_between()
from matplotlib.pyplot
. For more information you could take a look over here, from which the following code is inspired.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Declare the array containing the series you want to plot.
#For example:
time_series_array = np.sin(np.linspace(-np.pi, np.pi, 400)) + np.random.rand((400))
n_steps = 15 #number of rolling steps for the mean/std.
#Compute curves of interest:
time_series_df = pd.DataFrame(time_series_array)
smooth_path = time_series_df.rolling(n_steps).mean()
path_deviation = 2 * time_series_df.rolling(n_steps).std()
under_line = (smooth_path-path_deviation)[0]
over_line = (smooth_path+path_deviation)[0]
#Plotting:
plt.plot(smooth_path, linewidth=2) #mean curve.
plt.fill_between(path_deviation.index, under_line, over_line, color='b', alpha=.1) #std curves.
With the above code you obtain something like this:
Looks like, you're doubling the std twice. I guess it should be like this:
time_series_df = pd.DataFrame(time_series_array)
smooth_path = time_series_df.rolling(20).mean()
path_deviation = time_series_df.rolling(20).std()
plt.plot(smooth_path, linewidth=2)
plt.fill_between(path_deviation.index, (smooth_path-2*path_deviation)[0], (smooth_path+2*path_deviation)[0], color='b', alpha=.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With