Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute correlation between features and target variable

What is the best solution to compute correlation between my features and target variable ?? My dataframe have 1000 rows and 40 000 columns...

Exemple :

df = pd.DataFrame([[1, 2, 4 ,6], [1, 3, 4, 7], [4, 6, 8, 12], [5, 3, 2 ,10]], columns=['Feature1', 'Feature2','Feature3','Target'])

This code works fine but this is too long on my dataframe ... I need only the last column of correlation matrix : correlation with target (not pairwise feature corelation).

corr_matrix=df.corr()
corr_matrix["Target"].sort_values(ascending=False)

The np.corcoeff() function works with array but can we exclude the pairwise feature correlation ?

like image 561
Cox Tox Avatar asked Sep 25 '18 11:09

Cox Tox


People also ask

How do you calculate the correlation between two variables?

The correlation coefficient is determined by dividing the covariance by the product of the two variables' standard deviations. Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together.

What is correlation between variables or features of a dataset?

Data Correlation: Is a way to understand the relationship between multiple variables and attributes in your dataset. Using Correlation, you can get some insights such as: One or multiple attributes depend on another attribute or a cause for another attribute.

How do you check the correlation between features in pandas?

Using the corr() method Using the Pandas correlation method we can see correlations for all numerical columns in the DataFrame. Since this is a method, all we have to do is call it on the DataFrame. The return value will be a new DataFrame showing each correlation.


1 Answers

You could use pandas corr on each column:

df.drop("Target", axis=1).apply(lambda x: x.corr(df.Target))
like image 108
w-m Avatar answered Jan 11 '23 22:01

w-m