Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate correlated data in Python (3.3)

In R there is a function (cm.rnorm.cor, from package CreditMetrics), that takes the amount of samples, the amount of variables, and a correlation matrix in order to create correlated data.

Is there an equivalent in Python?

like image 580
PascalVKooten Avatar asked Apr 15 '13 21:04

PascalVKooten


People also ask

How do you create a correlated variable?

To generate correlated normally distributed random samples, one can first generate uncorrelated samples, and then multiply them by a matrix C such that CCT=R, where R is the desired covariance matrix. C can be created, for example, by using the Cholesky decomposition of R, or from the eigenvalues and eigenvectors of R.

How do you plot a correlation in Python?

You can plot correlation between two columns of pandas dataframe using sns. regplot(x=df['column_1'], y=df['column_2']) snippet. You can see the correlation of the two columns of the dataframe as a scatterplot.

Which function is used to find correlation in Python?

The pearsonr() SciPy function can be used to calculate the Pearson's correlation coefficient between two data samples with the same length. We can calculate the correlation between the two variables in our test problem.


1 Answers

The method multivariate_normal of the Generator class in numpy.random is the function that you want.

Example:

import numpy as np
import matplotlib.pyplot as plt


num_samples = 400

# The desired mean values of the sample.
mu = np.array([5.0, 0.0, 10.0])

# The desired covariance matrix.
r = np.array([
        [  3.40, -2.75, -2.00],
        [ -2.75,  5.50,  1.50],
        [ -2.00,  1.50,  1.25]
    ])

# Generate the random samples.
rng = np.random.default_rng()
y = rng.multivariate_normal(mu, r, size=num_samples)


# Plot various projections of the samples.
plt.subplot(2,2,1)
plt.plot(y[:,0], y[:,1], 'b.', alpha=0.25)
plt.plot(mu[0], mu[1], 'ro', ms=3.5)
plt.ylabel('y[1]')
plt.axis('equal')
plt.grid(True)

plt.subplot(2,2,3)
plt.plot(y[:,0], y[:,2], 'b.', alpha=0.25)
plt.plot(mu[0], mu[2], 'ro', ms=3.5)
plt.xlabel('y[0]')
plt.ylabel('y[2]')
plt.axis('equal')
plt.grid(True)

plt.subplot(2,2,4)
plt.plot(y[:,1], y[:,2], 'b.', alpha=0.25)
plt.plot(mu[1], mu[2], 'ro', ms=3.5)
plt.xlabel('y[1]')
plt.axis('equal')
plt.grid(True)

plt.show()

Result:

scatter plots

See also CorrelatedRandomSamples in the SciPy Cookbook.

like image 83
Warren Weckesser Avatar answered Sep 20 '22 08:09

Warren Weckesser