I'm trying to understand how the error bands are calculated in the tsplot. Examples of the error bands are shown here.
When I plot something simple like
sns.tsplot(np.array([[0,1,0,1,0,1,0,1], [1,0,1,0,1,0,1,0], [.5,.5,.5,.5,.5,.5,.5,.5]]))
I get a vertical line at y=0.5
as expected. The top error band is also a vertical line at around y=0.665
and the bottom error band is a vertical line at around y=0.335
. Can someone explain how these are derived?
Additional learning: Bootstrapping That was about Matplotlib; Seaborn uses bootstrapping to calculate the 95% confidence interval of data. In essence, it's a method of repeatedly resampling from a sample of the population, which gives good estimates of the true mean and 95% confidence.
You could set sns. lineplot(..., ci=None) to suppress the confidence interval.
You can also plot markers on a Seaborn line plot. Markers are special symbols that appear at the places in a line plot where the values for x and y axes intersect. To plot markers, you have to pass a list of symbols in a list to the markers attribute. Each symbol corresponds to one line plot.
You probably need to re-organize your dataframe in a suitable way so that there is one column for the x data, one for the y data, and one which holds the label for the data point. You can also just use matplotlib. pyplot . If you import seaborn , much of the improved design is also used for "regular" matplotlib plots.
Please downgrade your seaborn version. We have checked that version seaborn==0.9.0 is compatible with tsplot function. Hence we can downgrade it version seaborn==0.9.0.
In Seaborn v0.8.0 (July 2017) was added the ability to use error bars to show standard deviations rather than bootstrap confidence intervals in most statistical functions by putting ci="sd". So this now works For previous Seaborn versions a workaround for plotting standard deviation could be to use matplotlib errorbar on top of seaborn tsplot:
if you are getting the error “AttributeError:module ‘seaborn’ has no attribute ‘tsplot’ “. Please downgrade your seaborn version. We have checked that version seaborn==0.9.0 is compatible with tsplot function. Hence we can downgrade it version seaborn==0.9.0.
In Seaborn v0.8.0 (July 2017) was added the ability to use error bars to show standard deviations rather than bootstrap confidence intervals in most statistical functions by putting ci="sd". So this now works.
EDIT: The question and this answer referred to old versions of Seaborn and is not relevant for new versions. See @CGFoX 's comment below.
I'm not a statistician, but I read through the seaborn code in order to see exactly what's happening. There are three steps:
Bootstrap resampling. Seaborn creates resampled versions of your data. Each of these is a 3x8 matrix like yours, but each row is randomly selected from the three rows of your input. For example, one might be:
[[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]]
and another might be:
[[ 1. 0. 1. 0. 1. 0. 1. 0. ]
[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
[ 0. 1. 0. 1. 0. 1. 0. 1. ]]
It creates n_boot
of these (10000 by default).
Central tendency estimation. Seaborn runs a function on each of the columns of each of the 10000 resampled versions of your data. Because you didn't specify this argument (estimator
), it feeds the columns to a mean function (numpy.mean
with axis=0
). Lots of your columns in your bootstrap iterations are going to have a mean of 0.5, because they will be things like [0, 0.5, 1], [0.5, 1, 0], [0.5, 0.5, 0.5], etc. but you will also have some [1,1,0] and even some [1,1,1] which will result in higher means.
Confidence interval determination. For each column, seaborn sorts the 1000 estimates of the means calculated from each resampled version of the data from smallest to greatest, and picks the ones which represent the upper and lower CI. By default, it's using a 68% CI, so if you line up all 1000 mean estimates, then it will pick the 160th and the 840th. (840-160 = 680, or 68% of 1000).
A couple of notes:
There are actually only 3^3, or 27, possible resampled versions of your array, and if you use a function such as mean where the order doesn't matter then there's only 3!, or 6. So all 10000 bootstrap iterations will be identical to one of those 27 versions, or 6 versions in the unordered case. This means that it's probably silly to do 10000 iterations in this case.
The means 0.3333... and 0.6666... that show up as your confidence intervals are the means for [1,1,0] and [1,0,0] or rearranged versions of those.
They show a bootstrap confidence interval, computed by resampling units (rows in the 2d array input form). By default it shows a 68 percent confidence interval, which is equivalent to a standard error, but this can be changed with the ci
parameter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With