pandas dataframe: how to count the number of 1 rows in a binary column?

Question

I have the following pandas DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({"first_column": [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]})

>>> df
    first_column
0              0
1              0
2              0
3              1
4              1
5              1
6              0
7              0
8              1
9              1
10             0
11             0
12             0
13             0
14             1
15             1
16             1
17             1
18             1
19             0
20             0

first_column is a binary column of 0s and 1s. There are "clusters" of consecutive ones, which are always found in pairs of at least two.

My goal is to create a column which "counts" the number of rows of ones per group:

>>> df
    first_column    counts
0              0        0
1              0        0
2              0        0
3              1        3
4              1        3
5              1        3
6              0        0
7              0        0
8              1        2
9              1        2
10             0        0
11             0        0
12             0        0
13             0        0
14             1        5
15             1        5
16             1        5
17             1        5
18             1        5
19             0        0
20             0        0

This sounds like a job for df.loc(), e.g. df.loc[df.first_column == 1]...something

I'm just not sure how to take into account each individual "cluster" of ones, and how to label each of the unique clusters with the "row count".

How would one do this?

Divakar · Accepted Answer

Here's one approach with NumPy's cumsum and bincount -

def cumsum_bincount(a):  
    # Append 0 & look for a [0,1] pattern. Form a binned array based off 1s groups
    ids = a*(np.diff(np.r_[0,a])==1).cumsum()

    # Get the bincount, index into the count with ids and finally mask out 0s
    return a*np.bincount(ids)[ids]

Sample run -

In [88]: df['counts'] = cumsum_bincount(df.first_column.values)

In [89]: df
Out[89]: 
    first_column  counts
0              0       0
1              0       0
2              0       0
3              1       3
4              1       3
5              1       3
6              0       0
7              0       0
8              1       2
9              1       2
10             0       0
11             0       0
12             0       0
13             0       0
14             1       5
15             1       5
16             1       5
17             1       5
18             1       5
19             0       0
20             0       0

Set the first 6 elems to be 1s and then test out -

In [101]: df.first_column.values[:5] = 1

In [102]: df['counts'] = cumsum_bincount(df.first_column.values)

In [103]: df
Out[103]: 
    first_column  counts
0              1       6
1              1       6
2              1       6
3              1       6
4              1       6
5              1       6
6              0       0
7              0       0
8              1       2
9              1       2
10             0       0
11             0       0
12             0       0
13             0       0
14             1       5
15             1       5
16             1       5
17             1       5
18             1       5
19             0       0
20             0       0

pandas dataframe: how to count the number of 1 rows in a binary column?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

ShanZhengYang

1 Answers

Divakar

Recent Activity

Donate For Us

pandas dataframe: how to count the number of 1 rows in a binary column?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

ShanZhengYang

1 Answers

Divakar

Related questions

Recent Activity

Donate For Us