Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Incrementally count occurrences in a column

I have a DataFrame (df) which contains a 'Name' column. In a column labeled 'Occ_Number' I would like to keep a running tally on the number of appearances of each value in 'Name'.

For example:

Name            Occ_Number
 abc                     1
 def                     1
 ghi                     1
 abc                     2
 abc                     3
 def                     2
 jkl                     1
 jkl                     2

I've been trying to come up with a method using

>df['Name'].value_counts()

but can't quite figure out how to tie it all together. I can only get the grand total from value_counts(). My process thus far involves creating a list of the 'Name' column string values which contain counts greater than 1 with the following code:

>things = df['Name'].value_counts()
>things = things[things > 1]
>queries = things.index.values

I was hoping to then somehow cycle through 'Name' and conditionally add to Occ_Number by checking against queries, but this is where I'm getting stuck. Does anybody know of a way to do this? I would appreciate any help. Thank you!

like image 965
big_ligands Avatar asked Feb 19 '15 03:02

big_ligands


People also ask

How do you count occurrences in a column in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you count the number of repeated values in pandas?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .

What is Value_counts () in pandas?

value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

What pandas function returns a series with the counts of each unique value in a column?

The value_counts() method returns a Series containing the counts of unique values. This means, for any column in a dataframe, this method returns the count of unique entries in that column.


2 Answers

You can use cumcount to avoid a dummy column:

>>> df["Occ_Number"] = df.groupby("Name").cumcount()+1
>>> df
  Name  Occ_Number
0  abc           1
1  def           1
2  ghi           1
3  abc           2
4  abc           3
5  def           2
6  jkl           1
7  jkl           2
like image 113
DSM Avatar answered Oct 05 '22 09:10

DSM


You can add a helper column and then use cumsum:

df =pd.DataFrame({'Name':['abc', 'def', 'ghi', 'abc', 'abc', 'def', 'jkl', 'jkl']})

add count:

df['counts'] =1

group by name:

cs =df.groupby('Name')['counts'].cumsum()
# set series name
cs.name = 'Occ_number'

join series back to dataframe:

# remove helper column
del df['counts']
df.join(cs)

returns:

    Name    Occ_number
 0  abc     1
 1  def     1
 2  ghi     1
 3  abc     2
 4  abc     3
 5  def     2
 6  jkl     1
 7  jkl     2
like image 45
JAB Avatar answered Oct 05 '22 10:10

JAB