I have following dataframe <pre class="prettyprint"><code>|----|----| | A | B | | a1 | b1 | | a2 | b1 | | a1 | b2 | | a2 | b3 | </code></pre> I want to count by B per A and get the following result: <pre class="prettyprint"><code>|----|----|-------| | A | B | Count | | a1 | b1 | 1 | | | b2 | 1 | | | b3 | NaN | | a2 | b1 | 1 | | | b2 | NaN | | | b3 | 1 | </code></pre> I usually do this with <code>df.groupby([B])[A].count()</code> but in this case with kinda pivot table it's confusing for me Thanks in advance. UPDT: <code>df.info()</code> <pre class="prettyprint"><code><class 'pandas.core.frame.DataFrame'> Int64Index: 20422 entries, 180 to 96430 Data columns (total 2 columns): B 20422 non-null object A 20422 non-null object dtypes: object(2) memory usage: 478.6+ KB </code></pre> I'm getting with <code>df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count")</code>: <pre class="prettyprint"><code>|--|----|----|-------| | | A | B | Count | |0 | a1 | b1 | 1 | |1 | a1 | b2 | 1 | |2 | a1 | b3 | NaN | |3 | a2 | b1 | 1 | |4 | a2 | b2 | NaN | |5 | a2 | b3 | 1 | </code></pre>

1) One way would be grouping on <code>"A"</code> and computing the distinct counts of elements under <code>"B"</code> using <code>value_counts</code>. Then a fusion of <code>unstack</code> and <code>stack</code> with <code>dropna=False</code> to get the desired <code>DF</code>: <pre class="prettyprint"><code>df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count") </code></pre> 2) <code>pd.crosstab</code> also provides a good alternative if we replace the zero count elements with <code>np.NaN</code> after stacking: <pre class="prettyprint"><code>pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count") </code></pre> Both approaches yield: <img src="https://i.stack.imgur.com/kMrLr.png" alt="enter image description here"> <hr> edit1: To have the grouped key, <code>"A"</code> be displayed in a certain format (i.e keep the first occurence while replacing the rest with an empty string) <pre class="prettyprint"><code>df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count") df_g.loc[df_g.duplicated('A'), "A"] = "" </code></pre> <img src="https://i.stack.imgur.com/DXL1t.png" alt="enter image description here"> edit2: If you want <code>"A"</code> as a single wholesome cell being part of a multi-indexed <code>DF</code>: <pre class="prettyprint"><code>df.groupby('A')['B'].value_counts().unstack().stack(dropna=False ).reset_index(name="Count").set_index(['A', 'B']) </code></pre> <img src="https://i.stack.imgur.com/4Qd4N.png" alt="enter image description here">

You could groupby both columns and access the size of each group: <pre class="prettyprint"><code> df.groupby(['A', 'B']).size() </code></pre> returns: <pre class="prettyprint"><code>A B a1 b1 1 b2 1 a2 b1 1 b3 1 dtype: int64 </code></pre> It won't give you <code>NaN</code>'s for non existing combinations though.

Pandas: how to groupby with count with multiple levels on rows?

Tags:

python

pandas

I have following dataframe

|----|----|
| A  | B  |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |

I want to count by B per A and get the following result:

|----|----|-------|
| A  | B  | Count |
| a1 | b1 |  1    |
|    | b2 |  1    |
|    | b3 |  NaN  |
| a2 | b1 |  1    |
|    | b2 |  NaN  |
|    | b3 |  1    |

I usually do this with df.groupby([B])[A].count() but in this case with kinda pivot table it's confusing for me

Thanks in advance.

UPDT:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B    20422 non-null object
A             20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB

I'm getting with df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count"):

|--|----|----|-------|
|  | A  | B  | Count |
|0 | a1 | b1 |  1    |
|1 | a1 | b2 |  1    |
|2 | a1 | b3 |  NaN  |
|3 | a2 | b1 |  1    |
|4 | a2 | b2 |  NaN  |
|5 | a2 | b3 |  1    |

257

asked Mar 24 '17 11:03

Novitoll

2 Answers

1) One way would be grouping on "A" and computing the distinct counts of elements under "B" using value_counts. Then a fusion of unstack and stack with dropna=False to get the desired DF:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")

2) pd.crosstab also provides a good alternative if we replace the zero count elements with np.NaN after stacking:

pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")

Both approaches yield:

enter image description here

edit1:

To have the grouped key, "A" be displayed in a certain format (i.e keep the first occurence while replacing the rest with an empty string)

df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""

enter image description here

edit2:

If you want "A" as a single wholesome cell being part of a multi-indexed DF:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
                    ).reset_index(name="Count").set_index(['A', 'B'])

enter image description here

140

answered Nov 09 '22 20:11

Nickil Maveli

You could groupby both columns and access the size of each group:

 df.groupby(['A', 'B']).size()

returns:

A   B 
a1  b1    1
    b2    1
a2  b1    1
    b3    1
dtype: int64

It won't give you NaN's for non existing combinations though.

answered Nov 09 '22 19:11

Tim Tröndle

Related questions
                            
                                Python multiprocessing linux windows difference
                            
                                How to interleave numpy.ndarrays?
                            
                                Color Specific Bar Chart Differently in Python PPTX
                            
                                Creating a heap with heapify vs heappush. Which one is faster?
                            
                                Why is the accuracy of my CNN not reproducible?
                            
                                Django object.image.url not displaying even though path is correct
                            
                                finding all regex matches from a pandas dataframe column
                            
                                Python recursion in appending lists
                            
                                How to read data from text file into array with Python
                            
                                Pandas: TypeError: sort_values() missing 1 required positional argument: 'by'
                            
                                Set up and run a Bokeh server with Anaconda Cloud
                            
                                BeautifulSoup find_all limited to 50 results?
                            
                                Installing numpy on Mac to work on AWS Lambda
                            
                                pandas automatically converting my string column to float
                            
                                Return predicted values from a rolling regression grouped by id using Pandas
                            
                                Python multiprocessing: RuntimeError: "Queue objects should only be shared between processes through inheritance"
                            
                                Python3 asyncio: wait_for() communicate() with timeout, how to get partial result?
                            
                                Simultaneous operation of groupby and resample on pandas dataframe?
                            
                                How to avoid a "No commands supplied" on setup.py with py.test
                            
                                Optional[Type[Foo]] raises TypeError in Python 3.5.2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: how to groupby with count with multiple levels on rows?

Tags:

python

pandas

Novitoll

People also ask

2 Answers

Nickil Maveli

Tim Tröndle

Recent Activity

Donate For Us