I get my data from a SQL query from the table to my pandas Dataframe. The data looks like:
group phone_brand
0 M32-38 小米
1 M32-38 小米
2 M32-38 小米
3 M29-31 小米
4 M29-31 小米
5 F24-26 OPPO
6 M32-38 酷派
7 M32-38 小米
8 M32-38 vivo
9 F33-42 三星
10 M29-31 华为
11 F33-42 华为
12 F27-28 三星
13 M32-38 华为
14 M39+ 艾优尼
15 F27-28 华为
16 M32-38 小米
17 M32-38 小米
18 M39+ 魅族
19 M32-38 小米
20 F33-42 三星
21 M23-26 小米
22 M23-26 华为
23 M27-28 三星
24 M29-31 小米
25 M32-38 三星
26 M32-38 三星
27 F33-42 三星
28 M32-38 三星
29 M32-38 三星
... ... ...
74809 M27-28 华为
74810 M29-31 TCL
Now I want to found the correlation and the frequency from this to columns. But this in a visualization with Matplotlib. I try something like:
DataFrame.plot(style='o')
plt.show()
Now how can I visualize this correlation at simplest way?
You can use the pandas corr () function to get the correlation between columns of a dataframe. The following is the syntax: If you are applying the corr () function to get the correlation between two pandas columns (that is, two pandas series), it returns a single value representing the Pearson’s correlation between the two columns.
Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the dataframe. Any na values are automatically excluded. For any non-numeric data type columns in the dataframe it is ignored.
2. Correlation between all the columns of a dataframe You can also get the correlation between all the columns of a dataframe. For this, apply the corr () function on the entire dataframe which will result in a dataframe of pair-wise correlation values between all the columns.
pandas’ DataFrame class has the method corr () that computes three different correlation coefficients between two variables using any of the following methods : Pearson correlation method, Kendall Tau correlation method and Spearman correlation method. The correlation coefficients calculated using these methods vary from +1 to -1.
To quickly get a correlation:
df.apply(lambda x: x.factorize()[0]).corr()
group phone_brand
group 1.000000 0.427941
phone_brand 0.427941 1.000000
Heat map
import seaborn as sns
sns.heatmap(pd.crosstab(df.group, df.phone_brand))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With