Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe to count matrix

This must be obvious, but I couldn't find an easy solution.

I have pandas DataFrame like this:

actual | predicted
------ + ---------
Apple  | Apple
Apple  | Apple
Apple  | Banana
Banana | Orange
Orange | Apple

I want this:

       |  Apple  | Banana  | Orange
------ + ------- + ------- + -------
Apple  |  2      | 1       | 0
Banana |  0      | 0       | 1
Orange |  1      | 0       | 0
like image 890
Gregor Sturm Avatar asked Nov 28 '16 08:11

Gregor Sturm


People also ask

How do I count rows in Pandas DataFrame?

Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.

How do I count rows and columns in Pandas?

To get the number of rows, and columns we can use len(df. axes[]) function in Python.


1 Answers

You can use groupby with aggregating size and unstack MultiIndex:

df = df.groupby(['actual','predicted']).size().unstack(fill_value=0)
print (df)
predicted  Apple  Banana  Orange
actual                          
Apple          2       1       0
Banana         0       0       1
Orange         1       0       0

Another solution with crosstab:

df = pd.crosstab(df.actual, df.predicted)
print (df)
predicted  Apple  Banana  Orange
actual                          
Apple          2       1       0
Banana         0       0       1
Orange         1       0       0
like image 139
jezrael Avatar answered Sep 22 '22 08:09

jezrael