According to my understanding, circular variance has a range between 0 and 1. This is also confirmed in wikipedia as well as here. But for some reasons, circular variance function from scipy.stats
gives values above 1.
import numpy as np
from scipy.stats import circmean, circvar
a = np.random.randint(0, high=360, size=10)
print(a)
print(circmean(a, 0, 360))
print(circvar(np.deg2rad(a)))
[143 116 152 172 349 152 182 306 345 81]
135.34974541954665
2.2576538466653857
Could somebody inform me why I am getting values above 1 from the function circvar
The circular variance provides a measure of the spread of a set of dihedral angles. It is applied here to each residue's distribution of , , and. angles across all the members of the NMR ensemble. So, for example, it can provide a measure of how tightly or loosely a given residue's.
stats. mstats ) This module contains a large number of statistical functions that can be used with masked arrays. Most of these functions are similar to those in scipy.
describe() function | Python. scipy. stats. describe(array, axis=0) computes the descriptive statistics of the passed array elements along the specified axis of the array.
stats ) This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.
The less-helpful answer would be since that's how scipy defines it, so you'd better ask the developers to get a definite answer. Really. the example from the docs is
from scipy.stats import circvar
circvar([0, 2*np.pi/3, 5*np.pi/3])
2.19722457734
So you can't say the behavior is unexpedected. But why is it done that way?
Your second link defines the circular variance for a set of n angles a_1, ... a_n as
V = 1 − \hat{R_1}
Where
\hat{R_1} = R_1 / n R_1 = \sqrt{C^2 + S^2}
and
C = \sum_{i=1}^n cos(a_i) S = \sum_{i=1}^n sin(a_i)
The scipy library finds the circular variance by
ang = (samples - low)*2.*pi / (high - low)
S = sin(ang).mean(axis=axis)
C = cos(ang).mean(axis=axis)
R = hypot(S, C)
return ((high - low)/2.0/pi)**2 * 2 * log(1/R)
That's a bit tricky to understand. If we assume the samples are zero-mean, the range is [0, 2*pi], and the default axis is being used (all true in the example) it can be simplified to
S = mean(sin(samples))
C = mean(cos(samples))
R = hypot(S, C)
V = 2 * log(1/R)
So the definition used by scipy transforms R by 2*log(1/R), rather than 1-R. That seems odd. Looking through the history, https://github.com/scipy/scipy/blame/v1.1.0/scipy/stats/morestats.py#L2696-L2733, at one point the stats were calculated using
ang = (samples - low)*2*pi / (high-low)
res = stats.mean(exp(1j*ang))
V = 1-abs(res)
return ((high-low)/2.0/pi)**2 * V
Which seems in line with the definitions you've provided. That was changed in a bugfix at the same time tests were added, but without any reference as to where the new computations came from.
Some discussion on the scipy bug tracker is available at https://github.com/scipy/scipy/pull/5747. It suggests the behavior is intentional, and won't be fixed. There's another implementation available in astropy, http://docs.astropy.org/en/stable/api/astropy.stats.circvar.html, which notes
The definition used here differs from the one in scipy.stats.circvar. Precisely, Scipy circvar uses an approximation based on the limit of small angles which approaches the linear variance.
So, in summary, for unknown reasons scipy
uses an approximation (that seems to be rather poor in some cases). However, due to backwards compatibility it won't be fixed, so you may want to use astropy
's implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With