Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert list of correlations to covariance matrix?

I have a list of correlations generated from the text file with that form:

(first two values indicate between which points is the correlation)

2     1  -0.798399811877855E-01
3     1   0.357718108972297E+00
3     2  -0.406142457763738E+00
4     1   0.288467030571132E+00
4     2  -0.129115034405361E+00
4     3   0.156739504479856E+00
5     1  -0.756332254716083E-01
5     2   0.479036971438800E+00
5     3  -0.377545460300584E+00
5     4  -0.265467953118191E+00
6     1   0.909003414436468E-01
6     2  -0.363568902645620E+00
6     3   0.482042347959232E+00
6     4   0.292931692897587E+00
6     5  -0.739868576924150E+00

I already have another list with standard deviations associated with all of the points. How do I combine these two in numpy/scipy to create a covariance matrix?

It needs to be a very efficient method since there are 300 points, so ~ 50 000 correlations.

like image 226
Melanie Avatar asked May 15 '15 04:05

Melanie


1 Answers

Assuming that this table is named df and that the first column is labeled A and the second is B with the correlation value labeled Correlation:

df2 = df.pivot(index='A', columns='B', values='Correlation')
>>> df2
B       1      2      3      4     5
A                                   
2 -0.0798    NaN    NaN    NaN   NaN
3  0.3580 -0.406    NaN    NaN   NaN
4  0.2880 -0.129  0.157    NaN   NaN
5 -0.0756  0.479 -0.378 -0.265   NaN
6  0.0909 -0.364  0.482  0.293 -0.74

To convert this into a symmetrical square matrix with ones in the diagonal:

# Get a unique list of all items in rows and columns.
items = list(df2)
items.extend(list(df2.index))
items = list(set(items))

# Create square symmetric correlation matrix
corr = df2.values.tolist()
corr.insert(0, [np.nan] * len(corr))
corr = pd.DataFrame(corr)
corr[len(corr) - 1] = [np.nan] * len(corr)
for i in range(len(corr)):
    corr.iat[i, i] = 1.  # Set diagonal to 1.00
    corr.iloc[i, i:] = corr.iloc[i:, i].values  # Flip matrix.

# Rename rows and columns.
corr.index = items
corr.columns = items

>>> corr
        1       2      3      4       5       6
1  1.0000 -0.0798  0.358  0.288 -0.0756  0.0909
2 -0.0798  1.0000 -0.406 -0.129  0.4790 -0.3640
3  0.3580 -0.4060  1.000  0.157 -0.3780  0.4820
4  0.2880 -0.1290  0.157  1.000 -0.2650  0.2930
5 -0.0756  0.4790 -0.378 -0.265  1.0000 -0.7400
6  0.0909 -0.3640  0.482  0.293 -0.7400  1.0000

Do the same steps to your std dev data if it is not already in a matrix form.

Assuming this matrix is named df_std, then you can get the covariance matrix as follows:

df_cov = corr.multiply(df_std.multiply(df_std.T.values))
like image 162
Alexander Avatar answered Sep 29 '22 08:09

Alexander