given a dataframe that logs uses of some books like this:
Name Type ID Book1 ebook 1 Book2 paper 2 Book3 paper 3 Book1 ebook 1 Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
Name Type ID Count Book1 ebook 1 2 Book2 paper 2 2 Book3 paper 3 1
How can this be done?
Thanks!
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
You want the following:
In [20]: df.groupby(['Name','Type','ID']).count().reset_index() Out[20]: Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby
on these, call count
and then reset_index
.
An alternative approach would be to add the 'Count' column using transform
and then call drop_duplicates
:
In [25]: df['Count'] = df.groupby(['Name'])['ID'].transform('count') df.drop_duplicates() Out[25]: Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1
I think as_index=False should do the trick.
df.groupby(['Name','Type','ID'], as_index=False).count()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With