How to plot a Cramer’s V heatmap for categorical features?

Tags:

The association between categorical variables should be computed using Crammer's V. Therefore, I found the following code to plot it, but I don't know why he plotted it for "contribution", which is a numeric variable?

def cramers_corrected_stat(confusion_matrix):
    """ calculate Cramers V statistic for categorical-categorical association.
        uses correction from Bergsma and Wicher, 
        Journal of the Korean Statistical Society 42 (2013): 323-328
    """
    chi2 = ss.chi2_contingency(confusion_matrix)[0]
    n = confusion_matrix.sum().sum()
    phi2 = chi2/n
    r,k = confusion_matrix.shape
    phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))    
    rcorr = r - ((r-1)**2)/(n-1)
    kcorr = k - ((k-1)**2)/(n-1)
    return np.sqrt(phi2corr / min( (kcorr-1), (rcorr-1)))


cols = ["Party", "Vote", "contrib"]
corrM = np.zeros((len(cols),len(cols)))
# there's probably a nice pandas way to do this
for col1, col2 in itertools.combinations(cols, 2):
    idx1, idx2 = cols.index(col1), cols.index(col2)
    corrM[idx1, idx2] = cramers_corrected_stat(pd.crosstab(df[col1], df[col2]))
    corrM[idx2, idx1] = corrM[idx1, idx2]

corr = pd.DataFrame(corrM, index=cols, columns=cols)
fig, ax = plt.subplots(figsize=(7, 6))
ax = sns.heatmap(corr, annot=True, ax=ax); ax.set_title("Cramer V Correlation between Variables");

I also found Bokeh. However, I am not sure if it uses Crammer's V to plot the heatmap or not?

Really, I have two categorical features: the first one has 2 categories and the second one has 37 categories. Could you please let me know how to plot Crammer's V heatmap?

Some part of my dataset is here.

Thanks in advance.

641

asked Aug 15 '18 13:08

ebrahimi

1 Answers

What's the problem? The code is absolutely right.

ax in this case ia a correlation matrix beetwen variables. Using "contribution" is not correct but you can see in the article bellow Quote

"This isn't right to do on the Contribution variable, but we'll do more with a model later."

* The author shows this variable for example only. In your case what's the reason to make plot Crammer's V? You have just two variables (as I see) and you will get only one correlation coefficient Crammer's V

But of course you can repeat the code on your data and get plot Crammer's V heatmap

195

answered Oct 15 '22 16:10

Edward

Related questions
                            
                                python3 context manager force early exit
                            
                                How to find all uses of a python function or variable in a python package
                            
                                How to find out the file extension for extracting audio tracks with ffmpeg and python?
                            
                                Where can the __bytes__ method be found?
                            
                                H2OFrame() in Python is adding additional duplicate rows to the Pandas DataFrame- Bug?
                            
                                linearmodels panelOLS: Regression output with stars
                            
                                How to make request without blocking (using asyncio)?
                            
                                Workaround for Google Earth Engine Python API and no support for `ee.mapclient` in Python 3
                            
                                gRPC: Rendezvous terminated with (StatusCode.INTERNAL, Received RST_STREAM with error code 2)
                            
                                Can't verify hashes for these requirements because we don't have a way to hash version control repositories
                            
                                Close session after use
                            
                                Use a method/function to format xlsx writer
                            
                                Cython compiler directive language_level not respected
                            
                                Pygame: Fill transparent areas of text with a color
                            
                                Get color scheme from GTK
                            
                                Can't execute msg (and other) Windows commands via subprocess
                            
                                undo "import math" in python?
                            
                                How to ensure pandas.DataFrame.to_csv is flush immediately
                            
                                zlib not available during python 3.7.0 install through pyenv, how can I fix this?
                            
                                List of classinfo Types

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to plot a Cramer’s V heatmap for categorical features?

Tags:

python-3.x

data-visualization

heatmap

categorical-data

bokeh

ebrahimi

People also ask

1 Answers

Edward

Recent Activity

Donate For Us