What is the best solution to compute correlation between my features and target variable ?? My dataframe have 1000 rows and 40 000 columns...
Exemple :
df = pd.DataFrame([[1, 2, 4 ,6], [1, 3, 4, 7], [4, 6, 8, 12], [5, 3, 2 ,10]], columns=['Feature1', 'Feature2','Feature3','Target'])
This code works fine but this is too long on my dataframe ... I need only the last column of correlation matrix : correlation with target (not pairwise feature corelation).
corr_matrix=df.corr()
corr_matrix["Target"].sort_values(ascending=False)
The np.corcoeff() function works with array but can we exclude the pairwise feature correlation ?
The correlation coefficient is determined by dividing the covariance by the product of the two variables' standard deviations. Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together.
Data Correlation: Is a way to understand the relationship between multiple variables and attributes in your dataset. Using Correlation, you can get some insights such as: One or multiple attributes depend on another attribute or a cause for another attribute.
Using the corr() method Using the Pandas correlation method we can see correlations for all numerical columns in the DataFrame. Since this is a method, all we have to do is call it on the DataFrame. The return value will be a new DataFrame showing each correlation.
You could use pandas corr
on each column:
df.drop("Target", axis=1).apply(lambda x: x.corr(df.Target))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With