I have pandas data frame with column 'year', 'month' and 'transaction id'. I want to get the transaction count of every month for every year. For ex my data is like:
year: {2015,2015,2015,2016,2016,2017}
month: {1, 1, 2, 2, 2, 1}
tid: {123, 343, 453, 675, 786, 332}
I want to get the output such that for every year I will get the number of transactions per month. For ex for year 2015 I will get the output:
month: [1,2]
count: [2,1]
I used groupby('year'). but after that how I can get the per month transaction count.
You need groupby
by both columns - year
and month
and then aggregate size
:
year = [2015,2015,2015,2016,2016,2017]
month = [1, 1, 2, 2, 2, 1]
tid = [123, 343, 453, 675, 786, 332]
df = pd.DataFrame({'year':year, 'month':month,'tid':tid})
print (df)
month tid year
0 1 123 2015
1 1 343 2015
2 2 453 2015
3 2 675 2016
4 2 786 2016
5 1 332 2017
df1 = df.groupby(['year','month'])['tid'].size().reset_index(name='count')
print (df1)
year month count
0 2015 1 2
1 2015 2 1
2 2016 2 2
3 2017 1 1
Another option for more complex tasks - suppose you want to group by "year" and a function applied to "tid" - e.g. a bucket categorization
def tidBucket(x):
if x<300: return "low"
if (300<=x & x<700): return "medium"
if 700<=x: return "high"
Then the above solution would not work. You could solve the problem by first grouping by year, then iterate over the contents of the groupby object with another groupby:
gb = df.groupby(by='year') #['tid'].size().reset_index(name='count')
for _,df1 in gb:
df1.index = df1["tid"]
df1 = df1.groupby(by=tidBucket)
Then aggregate as desired. Alternatively, you could create an additional "bucket" column
df["bucket"] = df["tid"].map(tidBucket)
and follow the @jezrael 's solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With