Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation between two non-numeric columns in a Pandas DataFrame

I get my data from a SQL query from the table to my pandas Dataframe. The data looks like:

     group phone_brand
0      M32-38          小米
1      M32-38          小米
2      M32-38          小米
3      M29-31          小米
4      M29-31          小米
5      F24-26        OPPO
6      M32-38          酷派
7      M32-38          小米
8      M32-38        vivo
9      F33-42          三星
10     M29-31          华为
11     F33-42          华为
12     F27-28          三星
13     M32-38          华为
14       M39+         艾优尼
15     F27-28          华为
16     M32-38          小米
17     M32-38          小米
18       M39+          魅族
19     M32-38          小米
20     F33-42          三星
21     M23-26          小米
22     M23-26          华为
23     M27-28          三星
24     M29-31          小米
25     M32-38          三星
26     M32-38          三星
27     F33-42          三星
28     M32-38          三星
29     M32-38          三星
...       ...         ...
74809  M27-28          华为
74810  M29-31         TCL

Now I want to found the correlation and the frequency from this to columns. But this in a visualization with Matplotlib. I try something like:

DataFrame.plot(style='o')
plt.show() 

Now how can I visualize this correlation at simplest way?

like image 786
madik_atma Avatar asked Oct 29 '17 15:10

madik_atma


People also ask

How to get the correlation between columns of a Dataframe in Python?

You can use the pandas corr () function to get the correlation between columns of a dataframe. The following is the syntax: If you are applying the corr () function to get the correlation between two pandas columns (that is, two pandas series), it returns a single value representing the Pearson’s correlation between the two columns.

How to find the pairwise correlation of all columns in pandas?

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the dataframe. Any na values are automatically excluded. For any non-numeric data type columns in the dataframe it is ignored.

How do you find the correlation between two columns in MATLAB?

2. Correlation between all the columns of a dataframe You can also get the correlation between all the columns of a dataframe. For this, apply the corr () function on the entire dataframe which will result in a dataframe of pair-wise correlation values between all the columns.

What is Corr () method in pandas Dataframe?

pandas’ DataFrame class has the method corr () that computes three different correlation coefficients between two variables using any of the following methods : Pearson correlation method, Kendall Tau correlation method and Spearman correlation method. The correlation coefficients calculated using these methods vary from +1 to -1.


1 Answers

To quickly get a correlation:

df.apply(lambda x: x.factorize()[0]).corr()

                group  phone_brand
group        1.000000     0.427941
phone_brand  0.427941     1.000000

Heat map

import seaborn as sns

sns.heatmap(pd.crosstab(df.group, df.phone_brand))

enter image description here

like image 61
piRSquared Avatar answered Oct 16 '22 23:10

piRSquared