Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: how to groupby with count with multiple levels on rows?

Tags:

python

pandas

I have following dataframe

|----|----|
| A  | B  |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |

I want to count by B per A and get the following result:

|----|----|-------|
| A  | B  | Count |
| a1 | b1 |  1    |
|    | b2 |  1    |
|    | b3 |  NaN  |
| a2 | b1 |  1    |
|    | b2 |  NaN  |
|    | b3 |  1    |

I usually do this with df.groupby([B])[A].count() but in this case with kinda pivot table it's confusing for me

Thanks in advance.

UPDT:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B    20422 non-null object
A             20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB

I'm getting with df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count"):

|--|----|----|-------|
|  | A  | B  | Count |
|0 | a1 | b1 |  1    |
|1 | a1 | b2 |  1    |
|2 | a1 | b3 |  NaN  |
|3 | a2 | b1 |  1    |
|4 | a2 | b2 |  NaN  |
|5 | a2 | b3 |  1    |
like image 257
Novitoll Avatar asked Mar 24 '17 11:03

Novitoll


People also ask

How do I count the number of rows in each group of a Groupby object?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.

How do I count rows in pandas based on conditions?

Using count() method in Python Pandas we can count the rows and columns. Count method requires axis information, axis=1 for column and axis=0 for row. To count the rows in Python Pandas type df. count(axis=1) , where df is the dataframe and axis=1 refers to column.

How do you get Groupby rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

Can you use Groupby with multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


2 Answers

1) One way would be grouping on "A" and computing the distinct counts of elements under "B" using value_counts. Then a fusion of unstack and stack with dropna=False to get the desired DF:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")

2) pd.crosstab also provides a good alternative if we replace the zero count elements with np.NaN after stacking:

pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")

Both approaches yield:

enter image description here


edit1:

To have the grouped key, "A" be displayed in a certain format (i.e keep the first occurence while replacing the rest with an empty string)

df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""

enter image description here

edit2:

If you want "A" as a single wholesome cell being part of a multi-indexed DF:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
                    ).reset_index(name="Count").set_index(['A', 'B'])

enter image description here

like image 140
Nickil Maveli Avatar answered Nov 09 '22 20:11

Nickil Maveli


You could groupby both columns and access the size of each group:

 df.groupby(['A', 'B']).size()

returns:

A   B 
a1  b1    1
    b2    1
a2  b1    1
    b3    1
dtype: int64

It won't give you NaN's for non existing combinations though.

like image 22
Tim Tröndle Avatar answered Nov 09 '22 19:11

Tim Tröndle