How to check for correlation among continuous and categorical variables?

Tags:

I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know how to check for correlation among the categorical variables and the continuous target variable.

Current Code:

import pandas as pd
df_hosp = pd.read_csv('C:\Users\LAPPY-2\Desktop\LengthOfStay.csv')

data = df_hosp[['lengthofstay', 'male', 'female', 'dialysisrenalendstage', 'asthma', \
              'irondef', 'pneum', 'substancedependence', \
              'psychologicaldisordermajor', 'depress', 'psychother', \
              'fibrosisandother', 'malnutrition', 'hemo']]
print data.corr()

All of the variables apart from lengthofstay are categorical. Should this work?

373

asked Jun 22 '17 08:06

funnyguy

1 Answers

Convert your categorical variable into dummy variables here and put your variable in numpy.array. For example:

data.csv:

age,size,color_head
4,50,black
9,100,blonde
12,120,brown
17,160,black
18,180,brown

Extract data:

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')

df:

Convert categorical variable color_head into dummy variables:

df_dummies = pd.get_dummies(df['color_head'])
del df_dummies[df_dummies.columns[-1]]
df_new = pd.concat([df, df_dummies], axis=1)
del df_new['color_head']

df_new:

df_new

Put that in numpy array:

x = df_new.values

Compute the correlation:

correlation_matrix = np.corrcoef(x.T)
print(correlation_matrix)

Output:

array([[ 1.        ,  0.99574691, -0.23658011, -0.28975028],
       [ 0.99574691,  1.        , -0.30318496, -0.24026862],
       [-0.23658011, -0.30318496,  1.        , -0.40824829],
       [-0.28975028, -0.24026862, -0.40824829,  1.        ]])

See :

numpy.corrcoef

answered Sep 18 '22 18:09

glegoux

Related questions
                            
                                Regex, select closest match
                            
                                How can I share a class between processes?
                            
                                How do you add error bars to Bokeh plots in python?
                            
                                Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS
                            
                                Find the year with the most number of people alive in Python
                            
                                Curl POST request into pycurl code
                            
                                Python3 threading with uWSGI
                            
                                One object two foreign keys to the same table
                            
                                How does Pandas to_sql determine what dataframe column is placed into what database field?
                            
                                How to avoid NLTK's sentence tokenizer splitting on abbreviations?
                            
                                Using generator send() within a for loop
                            
                                Python Selenium Exception AttributeError: "'Service' object has no attribute 'process'" in selenium.webdriver.ie.service.Service
                            
                                Python Pandas Drop Duplicates keep second to last
                            
                                Result of -1%7 is different in javascript(-1) and python(6)
                            
                                How to write a Pandas Dataframe to existing Django model
                            
                                Write text in particular font color in MS word using python-docx
                            
                                Updating the values of variables inside a namedtuple() structure
                            
                                UserWarning: Label not :NUMBER: is present in all training examples
                            
                                Using google.protobuf.Any in python file
                            
                                How to use boto3 to get a list of EBS snapshots owned by me?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to check for correlation among continuous and categorical variables?

Tags:

python

linear-regression

correlation

categorical-data

funnyguy

People also ask

1 Answers

glegoux

Recent Activity

Donate For Us