Correlation among multiple categorical variables (Pandas)

Tags:

my original dataset

I have a data set made of 22 categorical variables (non-ordered). I would like to visualize their correlation in a nice heatmap. Since the Pandas built-in function

DataFrame.corr(method='pearson', min_periods=1)

only implement correlation coefficients for numerical variables (Pearson, Kendall, Spearman), I have to aggregate it myself to perform a chi-square or something like it and I am not quite sure which function use to do it in one elegant step (rather than iterating through all the cat1*cat2 pairs). To be clear, this is what I would like to end up with (a dataframe):

         cat1  cat2  cat3     cat1|  coef  coef  coef     cat2|  coef  coef  coef   cat3|  coef  coef  coef

Any ideas with pd.pivot_table or something in the same vein?

thanks in advance D.

857

asked Dec 30 '17 15:12

zar3bski

1 Answers

You can using pd.factorize

df.apply(lambda x : pd.factorize(x)[0]).corr(method='pearson', min_periods=1) Out[32]:       a    c    d a  1.0  1.0  1.0 c  1.0  1.0  1.0 d  1.0  1.0  1.0

Data input

df=pd.DataFrame({'a':['a','b','c'],'c':['a','b','c'],'d':['a','b','c']})

Update

from scipy.stats import chisquare  df=df.apply(lambda x : pd.factorize(x)[0])+1  pd.DataFrame([chisquare(df[x].values,f_exp=df.values.T,axis=1)[0] for x in df])  Out[123]:       0    1    2    3 0  0.0  0.0  0.0  0.0 1  0.0  0.0  0.0  0.0 2  0.0  0.0  0.0  0.0 3  0.0  0.0  0.0  0.0  df=pd.DataFrame({'a':['a','d','c'],'c':['a','b','c'],'d':['a','b','c'],'e':['a','b','c']})

133

answered Oct 09 '22 04:10

BENY

Related questions
                            
                                How to adjust import order of .css files in Angular
                            
                                Acquire/release semantics with 4 threads
                            
                                How to understand JS realms
                            
                                " Kubernetes is starting ....." forever error on windows 10
                            
                                CNCopyCurrentNetworkInfo with iOS 13
                            
                                Puppeteer fingerprint simulation
                            
                                Typescript eslint disable no-unused-vars
                            
                                StateHasChanged() vs InvokeAsync(StateHasChanged) in Blazor
                            
                                Converting ARBG to RGB with alpha blending
                            
                                How to use Python distutils?
                            
                                How do you get the ethernet address using Java?
                            
                                Reference app relative virtual paths in .css file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With