Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count consecutive repetitions in a pandas series

Tags:

python

pandas

Consider the following series, ser

date        id 
2000        NaN
2001        NaN 
2001        1
2002        1
2000        2
2001        2
2002        2
2001        NaN
2010        NaN
2000        1
2001        1
2002        1
2010        NaN

How to count the values such that every consecutive number is counted and returned? Thanks.

Count
NaN     2 
1       2 
2       3
NaN     2
1       3
NaN     1
like image 201
Oli Avatar asked Jul 30 '19 06:07

Oli


People also ask

How do you count occurrences in pandas series?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you find consecutive repeated numbers in Python?

Method #1 : Using loop + set() In this, we iterate and check for the next element, if equal to current, we add in result list, then it is converted to set to get distinct elements.

How do you check consecutive dates in pandas?

# Sum across consecutive days (or any other method from pandas groupby) df. groupby('grp_date'). sum() # Get the first value and last value per consecutive days df. groupby('grp_date').


2 Answers

Here is another approach using fillna to handle NaN values:

s = df.id.fillna('nan')
mask = s.ne(s.shift())

ids = s[mask].to_numpy()
counts = s.groupby(mask.cumsum()).cumcount().add(1).groupby(mask.cumsum()).max().to_numpy()

# Convert 'nan' string back to `NaN`
ids[ids == 'nan'] = np.nan
ser_out = pd.Series(counts, index=ids, name='counts')

[out]

nan    2
1.0    2
2.0    3
nan    2
1.0    3
nan    1
Name: counts, dtype: int64
like image 150
Chris Adams Avatar answered Sep 25 '22 08:09

Chris Adams


The cumsum trick is useful here, it's a little tricky with the NaNs though, so I think you need to handle these separately:

In [11]: df.id.isnull() & df.id.shift(-1).isnull()
Out[11]:
0      True
1     False
2     False
3     False
4     False
5     False
6     False
7      True
8     False
9     False
10    False
11    False
12     True
Name: id, dtype: bool

In [12]: df.id.eq(df.id.shift(-1))
Out[12]:
0     False
1     False
2      True
3     False
4      True
5      True
6     False
7     False
8     False
9      True
10     True
11    False
12    False
Name: id, dtype: bool

In [13]: (df.id.isnull() & df.id.shift(-1).isnull()) | (df.id.eq(df.id.shift(-1)))
Out[13]:
0      True
1     False
2      True
3     False
4      True
5      True
6     False
7      True
8     False
9      True
10     True
11    False
12     True
Name: id, dtype: bool

In [14]: ((df.id.isnull() & df.id.shift(-1).isnull()) | (df.id.eq(df.id.shift(-1)))).cumsum()
Out[14]:
0     1
1     1
2     2
3     2
4     3
5     4
6     4
7     5
8     5
9     6
10    7
11    7
12    8
Name: id, dtype: int64

Now you can use this labeling in your groupby:

In [15]: g = df.groupby(((df.id.isnull() & df.id.shift(-1).isnull()) | (df.id.eq(df.id.shift(-1)))).cumsum())

In [16]: pd.DataFrame({"count": g.id.size(), "id": g.id.nth(0)})
Out[16]:
    count   id
id
1       2  NaN
2       2  1.0
3       1  2.0
4       2  2.0
5       2  NaN
6       1  1.0
7       2  1.0
8       1  NaN
like image 21
Andy Hayden Avatar answered Sep 21 '22 08:09

Andy Hayden