I've got a df that contains the columns profession and media. I would like to calculate the correlation between those two columns.
Is there a short hack of calculating the correlation of columns of strings? Or do I have transform each profession and media to a number and then calculate the correlation with .corr()?
I found a similar question (Is there a way to get correlation with string data and a numerical value in pandas?) but I would like to check the string, not each word within the string.
df
profession media
0 media lawyer print
1 student online
2 student print
3 professor online
4 media lawyer online
By using corr() function we can get the correlation between two columns in the dataframe. where, dataframe is the input dataframe. first_column is correlated with second_column of the dataframe.
You can also get the correlation between all the columns of a pandas DataFrame. For this, apply corr() function on the entire DataFrame which will result in a DataFrame of pair-wise correlation values between all the columns. Note that by default, the corr() function returns Pearson's correlation. Yields below output.
Initialize two variables, col1 and col2, and assign them the columns that you want to find the correlation of. Find the correlation between col1 and col2 by using df[col1]. corr(df[col2]) and save the correlation value in a variable, corr. Print the correlation value, corr.
corr to get the correlation between two columns.
You can convert datatype to categorical and then do it
df['profession']=df['profession'].astype('category').cat.codes
df['media']=df['media'].astype('category').cat.codes
df.corr()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With