How to add a column after using python groupby if the function is also a custom function?

Question

I am stuck on what is probably a very simple problem. My data looks something like this:

id_not_unique	datetime	seconds
111111111	2020-08-26 15:44:58	122
111111111	2020-08-28 15:33:45	34
222222222	2020-07-12 11:21:09	26
222222222	2019-04-21 14:22:42	57

So I want to group by id_not_unique and then add a new column called time that is just finding the minimum datetime in the group and then return the corresponding seconds. So the result would look something like:

id_not_unique	time
111111111	122
222222222	57

I have tried this:

def wait_first_call(df):
    min_time = min(df['datetime'])

    idx = df.index[df.datetime == min_time]
    
    first_time = df['seconds'].iloc[idx]
    
    return first_time

then

df.groupby(['id_not_unique']).apply(wait_first_call)

But I keep getting an "IndexError: positional indexers are out-of-bounds". So I am not understanding why I am getting this error -- I thought the apply function took each group as a dataframe and applies the function to this group?

Any suggestions/help would be greatly appreciated.

Amit Vikram Singh · Accepted Answer

There are few issues with your code:

.iloc is for indexing using the row_number, column_number. You should not use index in .iloc, If one of your index is greater than the number of rows in the dataframe then it will throw error. That's what is happening in your case. You can resolve this error by using .loc which takes index.
Use df['datetime'].idxmin() to get the index corresponding to minimum b value. Problem with df.index[df.datetime == min_time] is that it returns a list of indices even if there is only one match (format: Index([ind]) and when you use this to index dataframe as in df['seconds'].loc[idx], it gives you a series which we don't need.

Use this code snippet:

def wait_first_call(df):
      idx = df['datetime'].idxmin()
      first_time = df['seconds'].loc[idx]
      return pd.Series({'time': first_time}, index = ['time'])

df.groupby(['id_not_unique'], as_index = False).apply(wait_first_call)

Update: Without Using .apply

I have found apply to be usually slow and hence I am posting another approach to solve:

First get the min_idx for each group in another column.

df['min_idx'] = (df.groupby('id_not_unique')['datetime']
                 .transform(lambda x: x.idxmin()))

Now Filter the dataframe where index is eq to min index.

 new_df = (df[df.index == df.min_idx]
             [['id_not_unique', 'seconds']]
             .rename(columns = {'seconds': 'time'}))

jezrael · Answer

Use DataFrame.sort_values with DataFrame.drop_duplicates with remove column datetime with rename:

df['datetime'] = pd.to_datetime(df['datetime'])

df = (df.sort_values(['id_not_unique','datetime'])
        .drop_duplicates('id_not_unique')
        .drop('datetime', axis=1)
        .rename(columns = {'seconds': 'time'}))
print (df)
   id_not_unique     time
0      111111111      122
3      222222222       57

Or use DataFrameGroupBy.idxmin for index by minimal datetime and select by DataFrame.loc:

df['datetime'] = pd.to_datetime(df['datetime'])

df = (df.loc[df.groupby('id_not_unique')['datetime'].idxmin()]
       .drop('datetime', axis=1)
       .rename(columns = {'seconds': 'time'}))
print (df)
   id_not_unique     time
0      111111111      122
3      222222222       57

How to add a column after using python groupby if the function is also a custom function?

Tags:

python

pandas

pandas-groupby

confused_donkey

2 Answers

Amit Vikram Singh

jezrael

Recent Activity

Donate For Us

How to add a column after using python groupby if the function is also a custom function?

Tags:

python

pandas

pandas-groupby

confused_donkey

2 Answers

Amit Vikram Singh

jezrael

Related questions

Recent Activity

Donate For Us