I have a DataFrame (df) which contains a 'Name' column. In a column labeled 'Occ_Number' I would like to keep a running tally on the number of appearances of each value in 'Name'.
For example:
Name Occ_Number
abc 1
def 1
ghi 1
abc 2
abc 3
def 2
jkl 1
jkl 2
I've been trying to come up with a method using
>df['Name'].value_counts()
but can't quite figure out how to tie it all together. I can only get the grand total from value_counts(). My process thus far involves creating a list of the 'Name' column string values which contain counts greater than 1 with the following code:
>things = df['Name'].value_counts()
>things = things[things > 1]
>queries = things.index.values
I was hoping to then somehow cycle through 'Name' and conditionally add to Occ_Number by checking against queries, but this is where I'm getting stuck. Does anybody know of a way to do this? I would appreciate any help. Thank you!
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .
value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
The value_counts() method returns a Series containing the counts of unique values. This means, for any column in a dataframe, this method returns the count of unique entries in that column.
You can use cumcount
to avoid a dummy column:
>>> df["Occ_Number"] = df.groupby("Name").cumcount()+1
>>> df
Name Occ_Number
0 abc 1
1 def 1
2 ghi 1
3 abc 2
4 abc 3
5 def 2
6 jkl 1
7 jkl 2
You can add a helper column and then use cumsum
:
df =pd.DataFrame({'Name':['abc', 'def', 'ghi', 'abc', 'abc', 'def', 'jkl', 'jkl']})
add count:
df['counts'] =1
group by name:
cs =df.groupby('Name')['counts'].cumsum()
# set series name
cs.name = 'Occ_number'
join series back to dataframe:
# remove helper column
del df['counts']
df.join(cs)
returns:
Name Occ_number
0 abc 1
1 def 1
2 ghi 1
3 abc 2
4 abc 3
5 def 2
6 jkl 1
7 jkl 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With