I have a huge municipal library catalog dataset with book title, the library it's in, the library's borough, and the number of times it was loaned out.
I want to find the top 3 most loaned books for each neighbourhood.
Ideally, I'd get something like this:
Borough Title Total_loans
A Book1 35615
A Book2 34895
A Book3 2548
B Book1 6541
B Book2 5425
etc.
This is the closest I was able to get, but the resulting data frame is not grouped by borough and hard to read.
import pandas as pd
df = pd.DataFrame({"borough":["A", "B", "B", "A", "A"], "title":["Book2", "Book1", "Book2", "Book2", "Book1"], "total_loans":[4, 48, 46, 78, 15]})
top_boroughs = df.groupby(['borough','title'])
top_boroughs.aggregate(sum).sort(['total_loans','title'], ascending=False)
Thanks for your help.
In short:
df.groupby(level=[0,1]).sum().reset_index().sort_values(['borough', 'total_loans'], ascending=[1,0]).groupby('borough').head(3)
The steps:
3
firstThis is superior to the accepted answer due to both
concat
, wasting memoryMy output (using head(1)
since test data has only 2
rows per group:
Out[484]:
borough title total_loans
1 A Book2 82
2 B Book1 48
something like this:
t = df.groupby(['borough', 'title']).sum()
t.sort('total_loans', ascending=True)
t = t.groupby(level=[0,1]).head(3).reset_index()
t.sort(['borough', 'title'], ascending=(True, False)) #not sure if this is necessary, tough to tell with limited data, but just in case...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With