I have a pandas dataframe with a size of 235607 records, and 94 attributes. I am very new python I was able to create a correlation matrix between all of the attributes but it is a lot to look through individually. I tried writing a for loop to print a list of the columns with a correlation greater than 80% but I keep getting the error "'DataFrame' object has no attribute 'c1'"
This is the code I used to create the correlation between the attributes as well as the sample for loop. Thank you in advance for your help :-
corr = data.corr() # data is the pandas dataframe
c1 = corr.abs().unstack()
c1.sort_values(ascending = False)
drop = [cols for cols in upper.c1 if any (upper[c1] > 0.80)]
drop
Sort in place, if you need to use the same variable c1 and then just grab the variables-names pair, using a comprehensive list using the indexes
c1.sort_values(ascending=True, inplace=True)
columns_above_80 = [(col1, col2) for col1, col2 in c1.index if c1[col1,col2] > 0.8 and col1 != col2]
Edit: Added col1 != col2 in the comprehensive list so you don't grab the auto-correlation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With