Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a row to every group with pandas groupby?

I wish to add a new row in the first line within each group, my raw dataframe is:

df = pd.DataFrame({
    'ID': ['James', 'James', 'James','Max', 'Max', 'Max', 'Max','Park','Tom', 'Tom', 'Tom', 'Tom','Wong'],
    'From_num': [78, 420, 'Started', 298, 36, 298, 'Started', 'Started', 60, 520, 99, 'Started', 'Started'],
    'To_num': [96, 78, 420, 36, 78, 36, 298, 311, 150, 520, 78, 99, 39],
    'Date': ['2020-05-12', '2020-02-02', '2019-06-18',
             '2019-06-20', '2019-01-30', '2018-10-23',
             '2018-08-29', '2020-05-21', '2019-11-22',
             '2019-08-26', '2018-12-11', '2018-10-09', '2019-02-01']})

it is like this:

      ID From_num  To_num        Date
0   James       78      96  2020-05-12
1   James      420      78  2020-02-02
2   James  Started     420  2019-06-18
3     Max      298      36  2019-06-20
4     Max       36      78  2019-01-30
5     Max      298      36  2018-10-23
6     Max  Started     298  2018-08-29
7    Park  Started     311  2020-05-21
8     Tom       60     150  2019-11-22
9     Tom      520     520  2019-08-26
10    Tom       99      78  2018-12-11
11    Tom  Started      99  2018-10-09
12   Wong  Started      39  2019-02-01

For each person ('ID'), I wish to create a new duplicate row on the first row within each group ('ID'), the values for the created row in column'ID', 'From_num' and 'To_num' should be the same as the previous first row, but the 'Date' value is the old 1st row's Date plus one day e.g. for James, the newly created row values is: 'James' '78' '96' '2020-05-13', same as the rest data, so my expected result is:

       ID From_num  To_num        Date
0   James       78      96  2020-05-13  # row added, Date + 1
1   James       78      96  2020-05-12
2   James      420      78  2020-02-02
3   James  Started     420  2019-06-18
4     Max      298      36  2019-06-21  # row added, Date + 1
5     Max      298      36  2019-06-20
6     Max       36      78  2019-01-30
7     Max      298      36  2018-10-23
8     Max  Started     298  2018-08-29
9    Park  Started     311  2020-05-22  # Row added, Date + 1
10   Park  Started     311  2020-05-21
11    Tom       60     150  2019-11-23  # Row added, Date + 1
12    Tom       60     150  2019-11-22
13    Tom      520     520  2019-08-26
14    Tom       99      78  2018-12-11
15    Tom  Started      99  2018-10-09
16   Wong  Started      39  2019-02-02  # Row added Date + 1
17   Wong  Started      39  2019-02-01

I wrote some loop conditions but quite slow, If you have any good ideas, please help. Thanks a lot

like image 203
Alice jinx Avatar asked Jul 26 '20 21:07

Alice jinx


People also ask

How do I Group data in pandas Dataframe?

Pandas DataFrame.groupby () To Group Rows into List By using DataFrame.gropby () function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply (list).

How does groupby work in pandas?

Similar to the SQL GROUP BY statement, the Pandas method works by splitting our data, aggregating it in a given way (or ways), and re-combining the data in a meaningful way. Because the .groupby () method works by first splitting the data, we can actually work with the groups directly.

How to add rows to a pandas Dataframe?

How to Add Rows to a Pandas DataFrame (With Examples) You can use the df.loc () function to add a row to the end of a pandas DataFrame: #add row to end of DataFrame df.loc[len(df.index)] = [value1, value2, value3,...] And you can use the df.append () function to append several rows of an existing DataFrame to the end of another DataFrame:

How to push grouped-on columns back into columns in pandas?

In the Pandas version, the grouped-on columns are pushed into the MultiIndex of the resulting Series by default: To more closely emulate the SQL result and push the grouped-on columns back into columns in the result, you an use as_index=False:


1 Answers

Let's try groupby.apply here. We'll append a row to each group at the start, like this:

def augment_group(group):
    first_row = group.iloc[[0]]
    first_row['Date'] += pd.Timedelta(days=1) 
    return first_row.append(group)

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
(df.groupby('ID', as_index=False, group_keys=False)
   .apply(augment_group)
   .reset_index(drop=True))

       ID From_num  To_num       Date
0   James       78      96 2020-05-13
1   James       78      96 2020-05-12
2   James      420      78 2020-02-02
3   James  Started     420 2019-06-18
4     Max      298      36 2019-06-21
5     Max      298      36 2019-06-20
6     Max       36      78 2019-01-30
7     Max      298      36 2018-10-23
8     Max  Started     298 2018-08-29
9    Park  Started     311 2020-05-22
10   Park  Started     311 2020-05-21
11    Tom       60     150 2019-11-23
12    Tom       60     150 2019-11-22
13    Tom      520     520 2019-08-26
14    Tom       99      78 2018-12-11
15    Tom  Started      99 2018-10-09
16   Wong  Started      39 2019-02-02
17   Wong  Started      39 2019-02-01

Although I agree with @Joran Beasley in the comments that this feels like somewhat of an XY problem. Perhaps try clarifying the problem you're trying to solve, instead of asking how to implement what you think is the solution to your issue?

like image 125
cs95 Avatar answered Oct 24 '22 22:10

cs95