Pandas, groupby and finding maximum in groups, returning value and count

Question

I have a pandas DataFrame with log data:

        host service
0   this.com    mail
1   this.com    mail
2   this.com     web
3   that.com    mail
4  other.net    mail
5  other.net     web
6  other.net     web

And I want to find the service on every host that gives the most errors:

        host service  no
0   this.com    mail   2
1   that.com    mail   1
2  other.net     web   2

The only solution I found was grouping by host and service, and then iterating over the level 0 of the index.

Can anyone suggest a better, shorter version? without the Iteration?

df = df_logfile.groupby(['host','service']).agg({'service':np.size})

df_count = pd.DataFrame()
df_count['host'] = df_logfile['host'].unique()
df_count['service']  = np.nan
df_count['no']    = np.nan

for h,data in df.groupby(level=0):
  i = data.idxmax()[0]   
  service = i[1]             
  no = data.xs(i)[0]
  df_count.loc[df_count['host'] == h, 'service'] = service
  df_count.loc[(df_count['host'] == h) & (df_count['service'] == service), 'no']   = no

full code https://gist.github.com/bjelline/d8066de66e305887b714

unutbu · Accepted Answer

Given df, the next step is to group by the host value alone and
aggregate by idxmax. This gives you the index which corresponds the the greatest service value. You can then use df.loc[...] to select the rows in df which correspond to the greatest service values:

import numpy as np
import pandas as pd

df_logfile = pd.DataFrame({ 
    'host' : ['this.com', 'this.com', 'this.com', 'that.com', 'other.net', 
              'other.net', 'other.net'],
    'service' : ['mail', 'mail', 'web', 'mail', 'mail', 'web', 'web' ] })

df = df_logfile.groupby(['host','service'])['service'].agg({'no':'count'})
mask = df.groupby(level=0).agg('idxmax')
df_count = df.loc[mask['no']]
df_count = df_count.reset_index()
print("
Output
{}".format(df_count))

yields the DataFrame

        host service  no
0  other.net     web   2
1   that.com    mail   1
2   this.com    mail   2

Pandas, groupby and finding maximum in groups, returning value and count

Tags:

python

pandas

numpy

bjelli

1 Answers

unutbu

Recent Activity

Donate For Us

Pandas, groupby and finding maximum in groups, returning value and count

Tags:

python

pandas

numpy

bjelli

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us