I have following dataframe
|----|----|
| A | B |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |
I want to count by B per A and get the following result:
|----|----|-------|
| A | B | Count |
| a1 | b1 | 1 |
| | b2 | 1 |
| | b3 | NaN |
| a2 | b1 | 1 |
| | b2 | NaN |
| | b3 | 1 |
I usually do this with df.groupby([B])[A].count()
but in this case with kinda pivot table it's confusing for me
Thanks in advance.
UPDT:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B 20422 non-null object
A 20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB
I'm getting with df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count")
:
|--|----|----|-------|
| | A | B | Count |
|0 | a1 | b1 | 1 |
|1 | a1 | b2 | 1 |
|2 | a1 | b3 | NaN |
|3 | a2 | b1 | 1 |
|4 | a2 | b2 | NaN |
|5 | a2 | b3 | 1 |
You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.
Using count() method in Python Pandas we can count the rows and columns. Count method requires axis information, axis=1 for column and axis=0 for row. To count the rows in Python Pandas type df. count(axis=1) , where df is the dataframe and axis=1 refers to column.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
1) One way would be grouping on "A"
and computing the distinct counts of elements under "B"
using value_counts
. Then a fusion of unstack
and stack
with dropna=False
to get the desired DF
:
df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")
2) pd.crosstab
also provides a good alternative if we replace the zero count elements with np.NaN
after stacking:
pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
Both approaches yield:
edit1:
To have the grouped key, "A"
be displayed in a certain format (i.e keep the first occurence while replacing the rest with an empty string)
df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""
edit2:
If you want "A"
as a single wholesome cell being part of a multi-indexed DF
:
df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
).reset_index(name="Count").set_index(['A', 'B'])
You could groupby both columns and access the size of each group:
df.groupby(['A', 'B']).size()
returns:
A B
a1 b1 1
b2 1
a2 b1 1
b3 1
dtype: int64
It won't give you NaN
's for non existing combinations though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With