backfill pandas dataframe column using a condition

Question

I have a pandas dataframe with 50 million records and what I am trying to do is backfill based on a condition. As we can see that the timestamps for name 800A and Barber align so I assume that the data belongs to same name and it is just an error while recording the data. The same goes with name Mia.

This is just the sample data.

my dataframe looks like this.

datetime       name     dischargeDate       HR Sp  x_inc   vs_inc  rec_num
01-05 18:04:50  Zawisza  14-01-05 18:05:00  119 98  FALSE   TRUE    6458445
01-05 18:04:55  Zawisza  14-01-05 18:05:00  120 97  FALSE   TRUE    6458445
01-05 18:05:00  Zawisza  14-01-05 18:05:00          FALSE   FALSE

01-29 17:58:45  800A     14-01-29 17:59:10          FALSE   FALSE

01-29 17:58:50  800A     14-01-29 17:59:10  139     FALSE   TRUE

01-29 17:58:55  800A     14-01-29 17:59:10  138     FALSE   TRUE

01-29 17:59:00  800A     14-01-29 17:59:10  138 96  FALSE   TRUE

01-29 17:59:15  Barber   14-01-29 18:17:15  138 96  FALSE   TRUE    7192783
01-29 17:59:20  Barber   14-01-29 18:17:15  138 96  FALSE   TRUE    7192783
01-29 17:59:25  Barber   14-01-29 18:17:15  138 95  FALSE   TRUE    7192783
03-04 21:19:45  800A     15-03-05 01:00:15          FALSE   FALSE

03-05 00:53:10  800A     15-03-05 01:00:15          FALSE   FALSE

03-05 00:55:50  800A     15-03-05 01:00:15      94  FALSE   TRUE

03-05 00:55:55  800A     15-03-05 01:00:15  81  93  FALSE   TRUE

03-05 00:56:00  800A     15-03-05 01:00:15  89  93  FALSE   TRUE

03-05 01:00:20  Mia      15-03-05 04:13:15  70  93  FALSE   TRUE    6728923
03-05 01:00:25  Mia      15-03-05 04:13:15  70  93  FALSE   TRUE    6728923
03-05 01:00:30  Mia      15-03-05 04:13:15  70  94  FALSE   TRUE    6728923

Now I am trying to backfill the record numbers(rec_num) column until it maps the bool condition False False in both the x_inc and vs_inc columns.

Actual output:

datetime       name     dischargeDate       HR Sp  x_inc   vs_inc  rec_num
01-05 18:04:50  Zawisza  14-01-05 18:05:00  119 98  FALSE   TRUE    6458445
01-05 18:04:55  Zawisza  14-01-05 18:05:00  120 97  FALSE   TRUE    6458445
01-05 18:05:00  Zawisza  14-01-05 18:05:00          FALSE   FALSE

01-29 17:58:45  800A     14-01-29 17:59:10          FALSE   FALSE

01-29 17:58:50  800A     14-01-29 17:59:10  139     FALSE   TRUE

01-29 17:58:55  800A     14-01-29 17:59:10  138     FALSE   TRUE

01-29 17:59:00  800A     14-01-29 17:59:10  138 96  FALSE   TRUE

01-29 17:59:15  Barber   14-01-29 18:17:15  138 96  FALSE   TRUE    7192783
01-29 17:59:20  Barber   14-01-29 18:17:15  138 96  FALSE   TRUE    7192783
01-29 17:59:25  Barber   14-01-29 18:17:15  138 95  FALSE   TRUE    7192783
03-04 21:19:45  800A     15-03-05 01:00:15          FALSE   FALSE

03-05 00:53:10  800A     15-03-05 01:00:15          FALSE   FALSE

03-05 00:55:50  800A     15-03-05 01:00:15      94  FALSE   TRUE

03-05 00:55:55  800A     15-03-05 01:00:15  81  93  FALSE   TRUE

03-05 00:56:00  800A     15-03-05 01:00:15  89  93  FALSE   TRUE

03-05 01:00:20  Mia      15-03-05 04:13:15  70  93  FALSE   TRUE    6728923
03-05 01:00:25  Mia      15-03-05 04:13:15  70  93  FALSE   TRUE    6728923
03-05 01:00:30  Mia      15-03-05 04:13:15  70  94  FALSE   TRUE    6728923

Expected output:

datetime       name     dischargeDate       HR Sp  x_inc   vs_inc  rec_num
01-05 18:04:50  Zawisza  14-01-05 18:05:00  119 98  FALSE   TRUE    6458445
01-05 18:04:55  Zawisza  14-01-05 18:05:00  120 97  FALSE   TRUE    6458445
01-05 18:05:00  Zawisza  14-01-05 18:05:00          FALSE   FALSE

01-29 17:58:45  800A     14-01-29 17:59:10          FALSE   FALSE

01-29 17:58:50  800A     14-01-29 17:59:10  139     FALSE   TRUE

01-29 17:58:55  800A     14-01-29 17:59:10  138     FALSE   TRUE

01-29 17:59:00  800A     14-01-29 17:59:10  138 96  FALSE   TRUE

01-29 17:59:15  Barber   14-01-29 18:17:15  138 96  FALSE   TRUE    7192783
01-29 17:59:20  Barber   14-01-29 18:17:15  138 96  FALSE   TRUE    7192783
01-29 17:59:25  Barber   14-01-29 18:17:15  138 95  FALSE   TRUE    7192783
03-04 21:19:45  800A     15-03-05 01:00:15          FALSE   FALSE

03-05 00:53:10  800A     15-03-05 01:00:15          FALSE   FALSE

03-05 00:55:50  800A     15-03-05 01:00:15      94  FALSE   TRUE

03-05 00:55:55  800A     15-03-05 01:00:15  81  93  FALSE   TRUE

03-05 00:56:00  800A     15-03-05 01:00:15  89  93  FALSE   TRUE

03-05 01:00:20  Mia      15-03-05 04:13:15  70  93  FALSE   TRUE    6728923
03-05 01:00:25  Mia      15-03-05 04:13:15  70  93  FALSE   TRUE    6728923
03-05 01:00:30  Mia      15-03-05 04:13:15  70  94  FALSE   TRUE    6728923

I am using df['rec_num'].fillna(method='bfill') but it fills completely which is not my ideal solution. I would appreciate if I can get any suggestions to this problem(or if there is any better approach). Thanks in advance.

anky · Accepted Answer

Using a boolean mask and np.where() you can use this:

cond=(df.x_inc == False) & (df.vs_inc == False) #creates a boolean mask where both columns are false
df['new_rec']=np.where(~cond,df.rec_num.bfill(),df.rec_num) #does a backfill on where condition is not met
print(df)

Note : you can reassign the values to the old column named rec_num instead of creating a new column. I added that so you could compare. Also this should be the fastest method since vectorized

    datetime            name    dischargeDate       HR      Sp      x_inc   vs_inc  rec_num     new_rec
0   2019-05-01 18:04:50 Zawisza 2005-01-14 18:05:00 119.0   98.0    False   True    6458445.0   6458445.0
1   2019-05-01 18:04:55 Zawisza 2005-01-14 18:05:00 120.0   97.0    False   True    6458445.0   6458445.0
2   2019-05-01 18:05:00 Zawisza 2005-01-14 18:05:00 NaN     NaN     False   False   NaN         NaN
3   2029-01-01 17:58:45 800A    2029-01-14 17:59:10 NaN     NaN     False   False   NaN         NaN
4   2029-01-01 17:58:50 800A    2029-01-14 17:59:10 139.0   NaN     False   True    NaN         7192783.0
5   2029-01-01 17:58:55 800A    2029-01-14 17:59:10 138.0   NaN     False   True    NaN         7192783.0
...........................................................
...........................................................
....................................................
.....................................

backfill pandas dataframe column using a condition

Tags:

python-3.x

pandas

dataframe

data-manipulation

Abalan Musk

1 Answers

anky

Recent Activity

Donate For Us

backfill pandas dataframe column using a condition

Tags:

python-3.x

pandas

dataframe

data-manipulation

Abalan Musk

1 Answers

anky

Related questions

Recent Activity

Donate For Us