Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate correlation between columns of strings

I've got a df that contains the columns profession and media. I would like to calculate the correlation between those two columns.

Is there a short hack of calculating the correlation of columns of strings? Or do I have transform each profession and media to a number and then calculate the correlation with .corr()?

I found a similar question (Is there a way to get correlation with string data and a numerical value in pandas?) but I would like to check the string, not each word within the string.

df

  profession        media      

0 media lawyer      print
1 student           online
2 student           print
3 professor         online
4 media lawyer      online
like image 286
Hannah Avatar asked Jul 09 '18 08:07

Hannah


People also ask

How do you find the correlation between columns?

By using corr() function we can get the correlation between two columns in the dataframe. where, dataframe is the input dataframe. first_column is correlated with second_column of the dataframe.

How do you find the correlation between two columns in a data frame?

You can also get the correlation between all the columns of a pandas DataFrame. For this, apply corr() function on the entire DataFrame which will result in a DataFrame of pair-wise correlation values between all the columns. Note that by default, the corr() function returns Pearson's correlation. Yields below output.

How do you calculate correlation between columns in pandas?

Initialize two variables, col1 and col2, and assign them the columns that you want to find the correlation of. Find the correlation between col1 and col2 by using df[col1]. corr(df[col2]) and save the correlation value in a variable, corr. Print the correlation value, corr.

How do you find the correlation between two columns in a Jupyter notebook?

corr to get the correlation between two columns.


1 Answers

You can convert datatype to categorical and then do it

df['profession']=df['profession'].astype('category').cat.codes
df['media']=df['media'].astype('category').cat.codes
df.corr()
like image 80
Sreekiran A R Avatar answered Oct 28 '22 06:10

Sreekiran A R