Edit: Basically solved I think.
I am using spearmanr from scipy.stats to find the correlations between variables across a number of different samples. I have around 2500 variables and 36 samples (or 'observations')
If I calculate the correlations using all 36 samples, spearmanr works fine. If I use only the first 18 samples it also works fine. However if I use the latter 18 samples I get an error and nans are returned.
This is the error:
/Home/s1215235/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1945: RuntimeWarning: invalid value encountered in true_divide
return c / sqrt(multiply.outer(d, d))
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in greater
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in less
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1719: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
This is the code:
populationdata = np.vstack(thing).astype(np.float)
rho, pval = stats.spearmanr(populationdata[:,sampleindexes], axis = 1)
(populationdata is a numpy array full of floats; [:,sampleindexes] allows only a few of the columns to be used.
And this is what rho is returned as:
[[ 1. nan nan ..., 1. -0.05882353
-0.08574929]
[ nan nan nan ..., nan nan
nan]
[ nan nan nan ..., nan nan
nan]
...,
[ 1. nan nan ..., 1. -0.05882353
-0.08574929]
[-0.05882353 nan nan ..., -0.05882353 1. 0.68599434]
[-0.08574929 nan nan ..., -0.08574929 0.68599434 1. ]]
Why does spearmanr output a NaN ? There is no variation in sequence_1 so its standard deviation is equal to 0 which will result in zero division in the spearmanr () function, thereby returning a NaN. What is the equivalent value of NaN in that case?
Say you have two n-tuples, x and y, where (x₁, y₁), (x₂, y₂), … are the observations as pairs of corresponding values. You can calculate the Spearman correlation coefficient ρ the same way as the Pearson coefficient.
scipy.stats.nanmean (array, axis=0) function calculates the arithmetic mean by ignoring the Nan (not a number) values of the array elements along the specified axis of the array.
NumPy has many statistics routines, including np.corrcoef (), that return a matrix of Pearson correlation coefficients. You can start by importing NumPy and defining two NumPy arrays. These are instances of the class ndarray. Call them x and y: Here, you use np.arange () to create an array x of integers between 10 (inclusive) and 20 (exclusive).
In a comment it was noted that "There are a lot of 0s though." So populationdata[:,sampleindexes]
probably has rows that are all 0. That will cause spearmanr
to generate nan
. For example,
In [3]: spearmanr([[0, 0, 0], [1, 2, 3]], axis=1)
/Users/warren/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py:1957: RuntimeWarning: invalid value encountered in true_divide
return c / sqrt(multiply.outer(d, d))
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in greater
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in less
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1729: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
Out[3]: (nan, nan)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With