I have a data frame like this: <pre class="prettyprint lang-none prettyprint-override"><code>0 04:10 obj1 1 04:10 obj1 2 04:11 obj1 3 04:12 obj2 4 04:12 obj2 5 04:12 obj1 6 04:13 obj2 </code></pre> Wanted to get a cumulative count for all the objects like this: <pre class="prettyprint lang-none prettyprint-override"><code>idx time object obj1_count obj2_count 0 04:10 obj1 1 0 1 04:10 obj1 2 0 2 04:11 obj1 3 0 3 04:12 obj2 3 1 4 04:12 obj2 3 2 5 04:12 obj1 4 2 6 04:13 obj2 4 3 </code></pre> Tried playing with cumsum but not sure that is the right way. Any suggestions?

There is a special function for such operation: <code>cumcount</code> <pre class="prettyprint"><code>>>> df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], columns=['A']) >>> df A 0 a 1 a 2 a 3 b 4 b 5 a >>> df.groupby('A').cumcount() 0 0 1 1 2 2 3 0 4 1 5 3 dtype: int64 >>> df.groupby('A').cumcount(ascending=False) 0 3 1 2 2 1 3 1 4 0 5 0 dtype: int64 </code></pre>

You can just compare the column against the value of interest and call <code>cumsum</code>: <pre class="prettyprint"><code>In [12]: df['obj1_count'] = (df['object'] == 'obj1').cumsum() df['obj2_count'] = (df['object'] == 'obj2').cumsum() df Out[12]: time object obj1_count obj2_count idx 0 04:10 obj1 1 0 1 04:10 obj1 2 0 2 04:11 obj1 3 0 3 04:12 obj2 3 1 4 04:12 obj2 3 2 5 04:12 obj1 4 2 6 04:13 obj2 4 3 </code></pre> Here the comparison will produce a boolean series: <pre class="prettyprint"><code>In [13]: df['object'] == 'obj1' Out[13]: idx 0 True 1 True 2 True 3 False 4 False 5 True 6 False Name: object, dtype: bool </code></pre> when you call <code>cumsum</code> on the above the <code>True</code> values are converted to <code>1</code> and <code>False</code> to <code>0</code> and are summed cumulatively

You can generalize this process by getting the <code>cumsum</code> of <code>pd.get_dummies</code>. This should work for an arbitrary number of objects you want to count, without needing to specify each one individually: <pre class="prettyprint"><code># Get the cumulative counts. counts = pd.get_dummies(df['object']).cumsum() # Rename the count columns as appropriate. counts = counts.rename(columns=lambda col: col+'_count') # Join the counts to the original df. df = df.join(counts) </code></pre> The resulting output: <pre class="prettyprint"><code> time object obj1_count obj2_count 0 04:10 obj1 1 0 1 04:10 obj1 2 0 2 04:11 obj1 3 0 3 04:12 obj2 3 1 4 04:12 obj2 3 2 5 04:12 obj1 4 2 6 04:13 obj2 4 3 </code></pre> You can omit the <code>rename</code> step if it's acceptable to use count as a prefix instead of a suffix, i.e. <code>'count_obj1'</code> instead of <code>'obj1_count'</code>. Simply use the <code>prefix</code> parameter of <code>pd.get_dummies</code>: <pre class="prettyprint"><code> counts = pd.get_dummies(df['object'], prefix='count').cumsum() </code></pre>

Pandas cumulative count [duplicate]

Tags:

python

pandas

running-count

I have a data frame like this:

0        04:10  obj1
1        04:10  obj1
2        04:11  obj1
3        04:12  obj2
4        04:12  obj2
5        04:12  obj1
6        04:13  obj2

Wanted to get a cumulative count for all the objects like this:

idx      time   object   obj1_count   obj2_count 
0        04:10  obj1        1             0
1        04:10  obj1        2             0
2        04:11  obj1        3             0
3        04:12  obj2        3             1
4        04:12  obj2        3             2
5        04:12  obj1        4             2
6        04:13  obj2        4             3

Tried playing with cumsum but not sure that is the right way. Any suggestions?

420

asked Nov 30 '16 23:11

jincept

4 Answers

There is a special function for such operation: cumcount

>>> df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], columns=['A'])
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').cumcount()
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False)
0    3
1    2
2    1
3    1
4    0
5    0
 dtype: int64

141

answered Oct 24 '22 05:10

Alex Glinsky

You can just compare the column against the value of interest and call cumsum:

In [12]:
df['obj1_count'] = (df['object'] == 'obj1').cumsum()
df['obj2_count'] = (df['object'] == 'obj2').cumsum()
df

Out[12]:
      time object  obj1_count  obj2_count
idx                                      
0    04:10   obj1           1           0
1    04:10   obj1           2           0
2    04:11   obj1           3           0
3    04:12   obj2           3           1
4    04:12   obj2           3           2
5    04:12   obj1           4           2
6    04:13   obj2           4           3

Here the comparison will produce a boolean series:

In [13]:
df['object'] == 'obj1'

Out[13]:
idx
0     True
1     True
2     True
3    False
4    False
5     True
6    False
Name: object, dtype: bool

when you call cumsum on the above the True values are converted to 1 and False to 0 and are summed cumulatively

answered Oct 24 '22 05:10

EdChum

You can generalize this process by getting the cumsum of pd.get_dummies. This should work for an arbitrary number of objects you want to count, without needing to specify each one individually:

# Get the cumulative counts.
counts = pd.get_dummies(df['object']).cumsum()

# Rename the count columns as appropriate.
counts = counts.rename(columns=lambda col: col+'_count')

# Join the counts to the original df.
df = df.join(counts)

The resulting output:

    time object  obj1_count  obj2_count
0  04:10   obj1           1           0
1  04:10   obj1           2           0
2  04:11   obj1           3           0
3  04:12   obj2           3           1
4  04:12   obj2           3           2
5  04:12   obj1           4           2
6  04:13   obj2           4           3

You can omit the rename step if it's acceptable to use count as a prefix instead of a suffix, i.e. 'count_obj1' instead of 'obj1_count'. Simply use the prefix parameter of pd.get_dummies:

 counts = pd.get_dummies(df['object'], prefix='count').cumsum()

answered Oct 24 '22 06:10

root

Here's a way using numpy

u, iv = np.unique(
    df.object.values,
    return_inverse=True
)

objcount = pd.DataFrame(
    (iv[:, None] == np.arange(len(u))).cumsum(0),
    df.index, u
)
pd.concat([df, objcount], axis=1)

enter image description here

answered Oct 24 '22 05:10

piRSquared

Related questions
                            
                                Changing the application and taskbar icon - Python/Tkinter
                            
                                How to execute Python script from Java (via command line)?
                            
                                Numpy chain comparison with two predicates
                            
                                How to disable Jinja2 for sections of template with {}?
                            
                                Add item into array if not already in array
                            
                                Adding +1 to a variable inside a function [duplicate]
                            
                                using regular expressions to exclude characters in a string search?
                            
                                When, if ever, to use the 'is' keyword in Python?
                            
                                How to get the MySQL type of error with PyMySQL?
                            
                                How to do JSON handler in Django
                            
                                numpy.fft() what is the return value amplitude + phase shift OR angle?
                            
                                Returning the URL's as a list from a YouTube search query [closed]
                            
                                django.core.exceptions.ImproperlyConfigured: The SECRET_KEY setting must not be empty
                            
                                Kerberos installation error, error: Setup script exited with error: command 'i686-linux-gnu-gcc' failed with exit status 1
                            
                                how to pass multiple parameters to class during initialization
                            
                                how to change image illumination in opencv python
                            
                                The current URL, app/, didn't match any of these
                            
                                How to install gnu gettext (>0.15) on windows? So I can produce .po/.mo files in Django
                            
                                Managing contents of requirements.txt for a Python virtual environment
                            
                                how to install python3-tk in centos?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas cumulative count [duplicate]

Tags:

python

pandas

running-count

jincept

People also ask

4 Answers

Alex Glinsky

EdChum

root

piRSquared

Recent Activity

Donate For Us