Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are the "error bands" in Seaborn tsplot calculated?

I'm trying to understand how the error bands are calculated in the tsplot. Examples of the error bands are shown here.

When I plot something simple like

sns.tsplot(np.array([[0,1,0,1,0,1,0,1], [1,0,1,0,1,0,1,0], [.5,.5,.5,.5,.5,.5,.5,.5]]))

I get a vertical line at y=0.5 as expected. The top error band is also a vertical line at around y=0.665 and the bottom error band is a vertical line at around y=0.335. Can someone explain how these are derived?

like image 510
theQman Avatar asked Apr 06 '15 23:04

theQman


People also ask

What is confidence interval in Seaborn?

Additional learning: Bootstrapping That was about Matplotlib; Seaborn uses bootstrapping to calculate the 95% confidence interval of data. In essence, it's a method of repeatedly resampling from a sample of the population, which gives good estimates of the true mean and 95% confidence.

How do you get rid of the confidence interval in Seaborn?

You could set sns. lineplot(..., ci=None) to suppress the confidence interval.

How do you put markers on Seaborn Lineplot?

You can also plot markers on a Seaborn line plot. Markers are special symbols that appear at the places in a line plot where the values for x and y axes intersect. To plot markers, you have to pass a list of symbols in a list to the markers attribute. Each symbol corresponds to one line plot.

How do you plot multiple lines in Seaborn?

You probably need to re-organize your dataframe in a suitable way so that there is one column for the x data, one for the y data, and one which holds the label for the data point. You can also just use matplotlib. pyplot . If you import seaborn , much of the improved design is also used for "regular" matplotlib plots.

Is Seaborn compatible with tsplot function?

Please downgrade your seaborn version. We have checked that version seaborn==0.9.0 is compatible with tsplot function. Hence we can downgrade it version seaborn==0.9.0.

How to show standard deviation with error bars in Seaborn?

In Seaborn v0.8.0 (July 2017) was added the ability to use error bars to show standard deviations rather than bootstrap confidence intervals in most statistical functions by putting ci="sd". So this now works For previous Seaborn versions a workaround for plotting standard deviation could be to use matplotlib errorbar on top of seaborn tsplot:

How to fix “attributeerror ‘module ‘Seaborn’ has no attribute ‘tsplot’”?

if you are getting the error “AttributeError:module ‘seaborn’ has no attribute ‘tsplot’ “. Please downgrade your seaborn version. We have checked that version seaborn==0.9.0 is compatible with tsplot function. Hence we can downgrade it version seaborn==0.9.0.

How to show standard deviation instead of bootstrap confidence interval in Seaborn?

In Seaborn v0.8.0 (July 2017) was added the ability to use error bars to show standard deviations rather than bootstrap confidence intervals in most statistical functions by putting ci="sd". So this now works.


2 Answers

EDIT: The question and this answer referred to old versions of Seaborn and is not relevant for new versions. See @CGFoX 's comment below.

I'm not a statistician, but I read through the seaborn code in order to see exactly what's happening. There are three steps:

  1. Bootstrap resampling. Seaborn creates resampled versions of your data. Each of these is a 3x8 matrix like yours, but each row is randomly selected from the three rows of your input. For example, one might be:

    [[ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
     [ 0.5  0.5  0.5  0.5 0.5 0.5  0.5  0.5]
     [ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]]
    

    and another might be:

    [[ 1.   0.   1.   0.   1.   0.   1.   0. ]
     [ 0.5  0.5  0.5  0.5 0.5  0.5  0.5  0.5]
     [ 0.   1.   0.   1.   0.   1.   0.   1. ]]
    

    It creates n_boot of these (10000 by default).

  2. Central tendency estimation. Seaborn runs a function on each of the columns of each of the 10000 resampled versions of your data. Because you didn't specify this argument (estimator), it feeds the columns to a mean function (numpy.mean with axis=0). Lots of your columns in your bootstrap iterations are going to have a mean of 0.5, because they will be things like [0, 0.5, 1], [0.5, 1, 0], [0.5, 0.5, 0.5], etc. but you will also have some [1,1,0] and even some [1,1,1] which will result in higher means.

  3. Confidence interval determination. For each column, seaborn sorts the 1000 estimates of the means calculated from each resampled version of the data from smallest to greatest, and picks the ones which represent the upper and lower CI. By default, it's using a 68% CI, so if you line up all 1000 mean estimates, then it will pick the 160th and the 840th. (840-160 = 680, or 68% of 1000).

A couple of notes:

  • There are actually only 3^3, or 27, possible resampled versions of your array, and if you use a function such as mean where the order doesn't matter then there's only 3!, or 6. So all 10000 bootstrap iterations will be identical to one of those 27 versions, or 6 versions in the unordered case. This means that it's probably silly to do 10000 iterations in this case.

  • The means 0.3333... and 0.6666... that show up as your confidence intervals are the means for [1,1,0] and [1,0,0] or rearranged versions of those.

like image 134
foobarbecue Avatar answered Oct 19 '22 20:10

foobarbecue


They show a bootstrap confidence interval, computed by resampling units (rows in the 2d array input form). By default it shows a 68 percent confidence interval, which is equivalent to a standard error, but this can be changed with the ci parameter.

like image 37
mwaskom Avatar answered Oct 19 '22 20:10

mwaskom