How to add a row to every group with pandas groupby?

Tags:

I wish to add a new row in the first line within each group, my raw dataframe is:

df = pd.DataFrame({
    'ID': ['James', 'James', 'James','Max', 'Max', 'Max', 'Max','Park','Tom', 'Tom', 'Tom', 'Tom','Wong'],
    'From_num': [78, 420, 'Started', 298, 36, 298, 'Started', 'Started', 60, 520, 99, 'Started', 'Started'],
    'To_num': [96, 78, 420, 36, 78, 36, 298, 311, 150, 520, 78, 99, 39],
    'Date': ['2020-05-12', '2020-02-02', '2019-06-18',
             '2019-06-20', '2019-01-30', '2018-10-23',
             '2018-08-29', '2020-05-21', '2019-11-22',
             '2019-08-26', '2018-12-11', '2018-10-09', '2019-02-01']})

it is like this:

Click to copy

      ID From_num  To_num        Date
0   James       78      96  2020-05-12
1   James      420      78  2020-02-02
2   James  Started     420  2019-06-18
3     Max      298      36  2019-06-20
4     Max       36      78  2019-01-30
5     Max      298      36  2018-10-23
6     Max  Started     298  2018-08-29
7    Park  Started     311  2020-05-21
8     Tom       60     150  2019-11-22
9     Tom      520     520  2019-08-26
10    Tom       99      78  2018-12-11
11    Tom  Started      99  2018-10-09
12   Wong  Started      39  2019-02-01

For each person ('ID'), I wish to create a new duplicate row on the first row within each group ('ID'), the values for the created row in column'ID', 'From_num' and 'To_num' should be the same as the previous first row, but the 'Date' value is the old 1st row's Date plus one day e.g. for James, the newly created row values is: 'James' '78' '96' '2020-05-13', same as the rest data, so my expected result is:

Click to copy

       ID From_num  To_num        Date
0   James       78      96  2020-05-13  # row added, Date + 1
1   James       78      96  2020-05-12
2   James      420      78  2020-02-02
3   James  Started     420  2019-06-18
4     Max      298      36  2019-06-21  # row added, Date + 1
5     Max      298      36  2019-06-20
6     Max       36      78  2019-01-30
7     Max      298      36  2018-10-23
8     Max  Started     298  2018-08-29
9    Park  Started     311  2020-05-22  # Row added, Date + 1
10   Park  Started     311  2020-05-21
11    Tom       60     150  2019-11-23  # Row added, Date + 1
12    Tom       60     150  2019-11-22
13    Tom      520     520  2019-08-26
14    Tom       99      78  2018-12-11
15    Tom  Started      99  2018-10-09
16   Wong  Started      39  2019-02-02  # Row added Date + 1
17   Wong  Started      39  2019-02-01

I wrote some loop conditions but quite slow, If you have any good ideas, please help. Thanks a lot

203

asked Jul 26 '20 21:07

Alice jinx

1 Answers

Let's try groupby.apply here. We'll append a row to each group at the start, like this:

Click to copy

def augment_group(group):
    first_row = group.iloc[[0]]
    first_row['Date'] += pd.Timedelta(days=1) 
    return first_row.append(group)

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
(df.groupby('ID', as_index=False, group_keys=False)
   .apply(augment_group)
   .reset_index(drop=True))

       ID From_num  To_num       Date
0   James       78      96 2020-05-13
1   James       78      96 2020-05-12
2   James      420      78 2020-02-02
3   James  Started     420 2019-06-18
4     Max      298      36 2019-06-21
5     Max      298      36 2019-06-20
6     Max       36      78 2019-01-30
7     Max      298      36 2018-10-23
8     Max  Started     298 2018-08-29
9    Park  Started     311 2020-05-22
10   Park  Started     311 2020-05-21
11    Tom       60     150 2019-11-23
12    Tom       60     150 2019-11-22
13    Tom      520     520 2019-08-26
14    Tom       99      78 2018-12-11
15    Tom  Started      99 2018-10-09
16   Wong  Started      39 2019-02-02
17   Wong  Started      39 2019-02-01

Although I agree with @Joran Beasley in the comments that this feels like somewhat of an XY problem. Perhaps try clarifying the problem you're trying to solve, instead of asking how to implement what you think is the solution to your issue?

125

answered Oct 24 '22 22:10

cs95

Related questions
                            
                                How to find set of lowest sum of distinct column elements in python?
                            
                                How can I get Sphinx autosummary to generate full API documentation for classes, as well as a *summary table* for those classes?
                            
                                What is the best library in python to deal with excel files? [closed]
                            
                                Overriding Flask-User/Flask-Login's default templates
                            
                                What should be the Input types for Earth Mover Loss when images are rated in decimals from 0 to 9 (Keras, Tensorflow)
                            
                                Randomly assign a pair to each item in a list without repetitions
                            
                                How to find all matches with a regex where part of the match overlaps
                            
                                "Move" refactoring in IntelliJ doesn't works with input
                            
                                How to efficiently assign to a slice of a tensor in TensorFlow
                            
                                Pyinstaller giving error ("Error loading Python ... no suitable image found")
                            
                                Hoverinformation for shapes in plotly
                            
                                Face Detection using Web(Html css) and Python
                            
                                How can I disable/hide the grouping of variables in vscode-python
                            
                                VS code Remote Container : Shell server terminated (code: 126, signal: null) unable to find user xxx: no matching entries in passwd file
                            
                                Plotly: How to subset data by year, month and day using dropdown menus?
                            
                                Discordbot using threading raise "RuntimeError: set_wakeup_fd only works in main thread" only on linux
                            
                                Tensorflow model prediction is slow
                            
                                Making a np.einsum faster when inputs are many identical arrays? (Or any other faster method)
                            
                                how to run spider multiple times with different input
                            
                                Debug Exact Cover Pentominoes, Wikipedia example incomplete? OR... I'm misunderstanding something (includes code)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to add a row to every group with pandas groupby?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Alice jinx

People also ask

1 Answers

cs95

Recent Activity

Donate For Us