I'm trying to understand how the error bands are calculated in the tsplot. Examples of the error bands are shown here. When I plot something simple like <pre class="prettyprint"><code>sns.tsplot(np.array([[0,1,0,1,0,1,0,1], [1,0,1,0,1,0,1,0], [.5,.5,.5,.5,.5,.5,.5,.5]])) </code></pre> I get a vertical line at <code>y=0.5</code> as expected. The top error band is also a vertical line at around <code>y=0.665</code> and the bottom error band is a vertical line at around <code>y=0.335</code>. Can someone explain how these are derived?

EDIT: The question and this answer referred to old versions of Seaborn and is not relevant for new versions. See @CGFoX 's comment below. I'm not a statistician, but I read through the seaborn code in order to see exactly what's happening. There are three steps: <ol> <li> Bootstrap resampling. Seaborn creates resampled versions of your data. Each of these is a 3x8 matrix like yours, but each row is randomly selected from the three rows of your input. For example, one might be: <pre class="prettyprint"><code>[[ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]] </code></pre> and another might be: <pre class="prettyprint"><code>[[ 1. 0. 1. 0. 1. 0. 1. 0. ] [ 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5] [ 0. 1. 0. 1. 0. 1. 0. 1. ]] </code></pre> It creates <code>n_boot</code> of these (10000 by default). </li> <li>Central tendency estimation. Seaborn runs a function on each of the columns of each of the 10000 resampled versions of your data. Because you didn't specify this argument (<code>estimator</code>), it feeds the columns to a mean function (<code>numpy.mean</code> with <code>axis=0</code>). Lots of your columns in your bootstrap iterations are going to have a mean of 0.5, because they will be things like [0, 0.5, 1], [0.5, 1, 0], [0.5, 0.5, 0.5], etc. but you will also have some [1,1,0] and even some [1,1,1] which will result in higher means.</li> <li>Confidence interval determination. For each column, seaborn sorts the 1000 estimates of the means calculated from each resampled version of the data from smallest to greatest, and picks the ones which represent the upper and lower CI. By default, it's using a 68% CI, so if you line up all 1000 mean estimates, then it will pick the 160th and the 840th. (840-160 = 680, or 68% of 1000).</li> </ol> A couple of notes: <ul> <li>There are actually only 3^3, or 27, possible resampled versions of your array, and if you use a function such as mean where the order doesn't matter then there's only 3!, or 6. So all 10000 bootstrap iterations will be identical to one of those 27 versions, or 6 versions in the unordered case. This means that it's probably silly to do 10000 iterations in this case.</li> <li>The means 0.3333... and 0.6666... that show up as your confidence intervals are the means for [1,1,0] and [1,0,0] or rearranged versions of those.</li> </ul>

How are the "error bands" in Seaborn tsplot calculated?

Tags:

python

statistics

seaborn

I'm trying to understand how the error bands are calculated in the tsplot. Examples of the error bands are shown here.

When I plot something simple like

sns.tsplot(np.array([[0,1,0,1,0,1,0,1], [1,0,1,0,1,0,1,0], [.5,.5,.5,.5,.5,.5,.5,.5]]))

I get a vertical line at y=0.5 as expected. The top error band is also a vertical line at around y=0.665 and the bottom error band is a vertical line at around y=0.335. Can someone explain how these are derived?

510

asked Apr 06 '15 23:04

theQman

2 Answers

EDIT: The question and this answer referred to old versions of Seaborn and is not relevant for new versions. See @CGFoX 's comment below.

I'm not a statistician, but I read through the seaborn code in order to see exactly what's happening. There are three steps:

Bootstrap resampling. Seaborn creates resampled versions of your data. Each of these is a 3x8 matrix like yours, but each row is randomly selected from the three rows of your input. For example, one might be:
```
[[ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]
 [ 0.5  0.5  0.5  0.5 0.5 0.5  0.5  0.5]
 [ 0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5]]
```
and another might be:
```
[[ 1.   0.   1.   0.   1.   0.   1.   0. ]
 [ 0.5  0.5  0.5  0.5 0.5  0.5  0.5  0.5]
 [ 0.   1.   0.   1.   0.   1.   0.   1. ]]
```
It creates n_boot of these (10000 by default).
Central tendency estimation. Seaborn runs a function on each of the columns of each of the 10000 resampled versions of your data. Because you didn't specify this argument (estimator), it feeds the columns to a mean function (numpy.mean with axis=0). Lots of your columns in your bootstrap iterations are going to have a mean of 0.5, because they will be things like [0, 0.5, 1], [0.5, 1, 0], [0.5, 0.5, 0.5], etc. but you will also have some [1,1,0] and even some [1,1,1] which will result in higher means.
Confidence interval determination. For each column, seaborn sorts the 1000 estimates of the means calculated from each resampled version of the data from smallest to greatest, and picks the ones which represent the upper and lower CI. By default, it's using a 68% CI, so if you line up all 1000 mean estimates, then it will pick the 160th and the 840th. (840-160 = 680, or 68% of 1000).

A couple of notes:

There are actually only 3^3, or 27, possible resampled versions of your array, and if you use a function such as mean where the order doesn't matter then there's only 3!, or 6. So all 10000 bootstrap iterations will be identical to one of those 27 versions, or 6 versions in the unordered case. This means that it's probably silly to do 10000 iterations in this case.
The means 0.3333... and 0.6666... that show up as your confidence intervals are the means for [1,1,0] and [1,0,0] or rearranged versions of those.

134

answered Oct 19 '22 20:10

foobarbecue

They show a bootstrap confidence interval, computed by resampling units (rows in the 2d array input form). By default it shows a 68 percent confidence interval, which is equivalent to a standard error, but this can be changed with the ci parameter.

answered Oct 19 '22 20:10

mwaskom

Related questions
                            
                                Bisect a Python List and finding the Index
                            
                                Pytest init setup for few modules
                            
                                Reverse diagonal on numpy python
                            
                                Concise Ruby hash equivalent of Python dict.get()
                            
                                Python local vs global variables
                            
                                Get a header with Python and convert in JSON (requests - urllib2 - json)
                            
                                How to create a modal window in pyqt?
                            
                                How can I check whether a URL is valid using `urlparse`?
                            
                                Sorting XML in python etree
                            
                                Rotate a 2D image around specified origin in Python
                            
                                Python Multiprocessing: Only one process is running
                            
                                What's the Pythonic way to report nonfatal errors in a parser?
                            
                                count occurrences of number by column in pandas data frame
                            
                                mat is not a numerical tuple : openCV error
                            
                                Masking user input in python with asterisks
                            
                                get_bucket() gives 'Bad Request' for S3 buckets I didn't create via Boto
                            
                                Adding colors to a 3d quiver plot in matplotlib
                            
                                Traceback when updating status on twitter via Tweepy
                            
                                Pandas selecting discontinuous columns from a dataframe
                            
                                Getting all instances of child node using xml.etree.ElementTree

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With