How to count consecutive repetitions in a pandas series

Tags:

python

pandas

Consider the following series, ser

date        id 
2000        NaN
2001        NaN 
2001        1
2002        1
2000        2
2001        2
2002        2
2001        NaN
2010        NaN
2000        1
2001        1
2002        1
2010        NaN

How to count the values such that every consecutive number is counted and returned? Thanks.

Count
NaN     2 
1       2 
2       3
NaN     2
1       3
NaN     1

201

asked Jul 30 '19 06:07

Oli

2 Answers

Here is another approach using fillna to handle NaN values:

s = df.id.fillna('nan')
mask = s.ne(s.shift())

ids = s[mask].to_numpy()
counts = s.groupby(mask.cumsum()).cumcount().add(1).groupby(mask.cumsum()).max().to_numpy()

# Convert 'nan' string back to `NaN`
ids[ids == 'nan'] = np.nan
ser_out = pd.Series(counts, index=ids, name='counts')

[out]

nan    2
1.0    2
2.0    3
nan    2
1.0    3
nan    1
Name: counts, dtype: int64

150

answered Sep 25 '22 08:09

Chris Adams

The cumsum trick is useful here, it's a little tricky with the NaNs though, so I think you need to handle these separately:

In [11]: df.id.isnull() & df.id.shift(-1).isnull()
Out[11]:
0      True
1     False
2     False
3     False
4     False
5     False
6     False
7      True
8     False
9     False
10    False
11    False
12     True
Name: id, dtype: bool

In [12]: df.id.eq(df.id.shift(-1))
Out[12]:
0     False
1     False
2      True
3     False
4      True
5      True
6     False
7     False
8     False
9      True
10     True
11    False
12    False
Name: id, dtype: bool

In [13]: (df.id.isnull() & df.id.shift(-1).isnull()) | (df.id.eq(df.id.shift(-1)))
Out[13]:
0      True
1     False
2      True
3     False
4      True
5      True
6     False
7      True
8     False
9      True
10     True
11    False
12     True
Name: id, dtype: bool

In [14]: ((df.id.isnull() & df.id.shift(-1).isnull()) | (df.id.eq(df.id.shift(-1)))).cumsum()
Out[14]:
0     1
1     1
2     2
3     2
4     3
5     4
6     4
7     5
8     5
9     6
10    7
11    7
12    8
Name: id, dtype: int64

Now you can use this labeling in your groupby:

In [15]: g = df.groupby(((df.id.isnull() & df.id.shift(-1).isnull()) | (df.id.eq(df.id.shift(-1)))).cumsum())

In [16]: pd.DataFrame({"count": g.id.size(), "id": g.id.nth(0)})
Out[16]:
    count   id
id
1       2  NaN
2       2  1.0
3       1  2.0
4       2  2.0
5       2  NaN
6       1  1.0
7       2  1.0
8       1  NaN

answered Sep 21 '22 08:09

Andy Hayden

Related questions
                            
                                Change image size with PIL in a Google Cloud Storage Bucket (from a VM in GCloud)
                            
                                How to return two variables from a python function and access its values without calling it two times?
                            
                                Support unknown values in Python Enums
                            
                                Why am I getting different errors when trying to read s3 key that does not exist
                            
                                How to do GridSearchCV for F1-score in classification problem with scikit-learn?
                            
                                How to escape the % and \ signs in pymysql using LIKE clause?
                            
                                Python: how to update data selection in bokeh?
                            
                                Fetching python class name while using abstract classes with `abc` library
                            
                                Why isn't __bases__ accessible in the class body?
                            
                                Using Python textwrap.shorten for string but with bytes width
                            
                                How to convert torch int64 to torch LongTensor?
                            
                                when restoring from a checkpoint, how can I change the data type of the parameters?
                            
                                Seaborn Jointplot Change Figsize [duplicate]
                            
                                How do I use youtube-dl's --add-header option?
                            
                                Negative accuracy score in regression models with Scikit-Learn
                            
                                Does networkx has a function to calculate the length of the path considering weights?
                            
                                2d numpy array, making each value the sum of the 3x3 square it is centered at
                            
                                Best practice when add a new unique field to an existing django model
                            
                                Python, list of tuples split into dictionaries
                            
                                get all unicode variations of a latin character

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With