I have an inventory journal that contains products and their relative inventory qty (resulting_qty) as well as the loss/gain every time inventory is added or subtracted (delta_qty).
The issue is that inventory records do not get updated daily, rather they are only updated when a change in inventory occurs. For this reason, it is difficult to extract the total inventory qty for all items on a given day, because some items are not recorded on certain days, despite the fact that they do have available inventory given their last entry resulting_qty was greater than 0. Logically, this would mean that an item went without a change in qty for a certain amount of days equal to the number of days between the max date and the last recorded date.
my data looks something like this, except in reality there are thousands of product ids
| date | timestamp | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2017-03-06 | 2017-03-06 12:24:22 | A | 0 | 0.0 |
| 2017-03-31 | 2017-03-31 02:43:11 | A | 3 | 3.0 |
| 2017-04-08 | 2017-04-08 22:04:35 | A | -1 | 2.0 |
| 2017-04-12 | 2017-04-12 18:26:39 | A | -1 | 1.0 |
| 2017-04-19 | 2017-04-19 09:15:38 | A | -1 | 0.0 |
| 2019-01-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-05 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-22 | 2019-04-22 11:06:33 | B | -1 | 1.0 |
| 2019-04-23 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-09 | 2019-05-09 16:25:41 | C | 2 | 2.0 |
Essentially, I need to make the data look something more like this so that I can simply pull a date and get the sum of total inventory for a given day when grouping by date (e.g. df.groupby(date).resulting_qty.sum()):
Note I removed the PID= A rows due to character limitations, but I hope you get the idea:
| date | timestamp | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2019-01-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-17 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-18 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-19 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-20 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-21 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-22 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-23 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-24 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-25 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-26 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-27 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-28 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-29 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-30 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-31 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-01 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-02 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-03 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-04 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-05 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-06 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-07 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-08 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-09 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-10 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-11 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-12 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-13 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-14 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-15 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-17 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-18 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-19 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-20 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-21 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-22 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-23 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-24 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-25 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-26 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-27 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-28 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-01 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-02 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-03 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-04 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-05 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-06 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-07 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-08 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-09 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-10 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-11 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-12 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-13 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-14 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-15 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-17 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-18 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-19 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-20 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-21 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-22 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-23 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-24 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-25 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-26 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-27 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-28 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-29 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-30 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-31 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-01 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-02 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-03 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-04 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-05 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-06 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-07 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-08 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-09 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-10 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-11 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-12 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-13 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-14 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-15 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-16 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-17 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-18 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-19 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-20 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-21 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-22 | 2019-04-22 11:06:33 | B | -1 | 1.0 |
| 2019-04-23 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-24 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-25 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-26 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-27 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-28 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-29 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-30 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-01 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-02 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-03 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-04 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-05 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-06 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-07 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-08 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-09 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-10 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-01-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-20 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-21 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-22 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-23 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-24 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-25 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-26 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-27 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-28 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-29 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-30 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-31 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-01 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-02 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-03 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-04 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-05 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-06 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-07 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-08 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-09 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-10 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-11 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-12 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-13 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-14 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-15 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-16 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-17 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-18 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-20 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-21 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-22 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-23 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-24 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-25 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-26 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-27 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-28 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-01 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-02 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-03 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-04 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-05 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-06 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-07 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-08 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-09 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-10 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-11 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-12 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-13 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-14 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-15 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-16 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-17 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-18 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-20 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-21 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-22 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-23 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-24 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-25 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-26 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-27 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-28 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-29 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-30 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-31 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-01 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-02 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-03 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-04 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-05 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-06 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-07 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-08 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-09 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-10 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-11 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-12 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-13 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-14 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-15 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-16 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-17 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
So far what I've done was created a series of loops that generates a date range between the min date of the product lifecycle and the max date of all products. I then append the last recorded row values as a new row with a new date if there is no information for said new date. I append these to lists, and then generate a new dataframe with the updated lists. The code is terribly slow and takes 2+ hours to complete on the total dataset:
date_list = []
pid_list= []
time_stamp_list = []
delta_qty_list = []
resulting_qty_list = []
timer = len(test.product_id.unique().tolist())
counter = 0
for product in test.product_id.unique().tolist():
counter+=1
print((counter/timer)*100)
temp_df = test.query(f'product_id=={product}', engine='python')
for idx,date in enumerate(pd.date_range(temp_df.index.min(),test.index.max()).tolist()):
min_date= temp_df.index.min()
if date.date() == min_date:
date2=min_date
pid = temp_df.loc[date2]['product_id']
timestamp = temp_df.loc[date2]['timestamp']
delta_qty = temp_df.loc[date2]['delta_qty']
resulting_qty = temp_df.loc[date2]['resulting_qty']
date_list.append(date2)
pid_list.append(pid)
delta_qty_list.append(delta_qty)
time_stamp_list.append(timestamp)
resulting_qty_list.append(resulting_qty)
else:
if date.date() in temp_df.index:
date2= date.date()
pid = temp_df.loc[date2]['product_id']
timestamp = temp_df.loc[date2]['timestamp']
delta_qty = temp_df.loc[date2]['delta_qty']
resulting_qty = temp_df.loc[date2]['resulting_qty']
date_list.append(date2)
pid_list.append(pid)
delta_qty_list.append(delta_qty)
time_stamp_list.append(timestamp)
resulting_qty_list.append(resulting_qty)
elif date.date() > date2:
date_list.append(date.date())
pid_list.append(pid)
time_stamp_list.append(timestamp)
delta_qty_list.append(delta_qty)
resulting_qty_list.append(resulting_qty)
else:
pass
Can someone please help me to understand what is the right way I should approach this as I'm 100% sure this is not the best approach.
Thank you
Pandas DataFrame cumprod() Method The cumprod() method returns a DataFrame with the cumulative product for each row.
The total number of elements of pandas. DataFrame is stored in the size attribute.
To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') .
Alternatively, you may store the results under an existing DataFrame column. For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros: You may then apply the following IF conditions, and then store the results under the existing ‘set_of_numbers’ column:
Applying an IF condition in Pandas DataFrame. Let’s now review the following 5 cases: (1) IF condition – Set of numbers. Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of ‘True’
You can achieve the same results by using either lambada, or just by sticking with Pandas. At the end, it boils down to working with the method that is best suited to your needs. Finally, you may want to check the following external source for additional information about Pandas DataFrame.
For our example, the Python code would look like this: import pandas as pd numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]} df = pd.DataFrame (numbers,columns= ['set_of_numbers']) df.loc
The idea here is to reindex the DataFrame
to fill the gaps you have.
Setup a DataFrame
generated using your sample:
from io import StringIO
buffer = StringIO()
buffer.write('''\
date|timestamp|pid|delta_qty|resulting_qty
2017-03-06|2017-03-06 12:24:22|A|0|0.0
2017-03-31|2017-03-31 02:43:11|A|3|3.0
2017-04-08|2017-04-08 22:04:35|A|-1|2.0
2017-04-12|2017-04-12 18:26:39|A|-1|1.0
2017-04-19|2017-04-19 09:15:38|A|-1|0.0
2019-01-16|2019-01-16 23:37:17|B|0|0.0
2019-01-19|2019-01-19 09:40:38|C|0|0.0
2019-04-05|2019-04-05 16:40:32|B|2|2.0
2019-04-22|2019-04-22 11:06:33|B|-1|1.0
2019-04-23|2019-04-23 13:23:17|B|-1|0.0
2019-05-09|2019-05-09 16:25:41|C|2|2.0
''')
buffer.seek(0)
df = pd.read_csv(buffer, sep='|', parse_dates=['date', 'timestamp'])
First, we generate a new, gap-less index between the min and max dates for each product. This has the effect of having no rows for a product after the last existing update, per your example. However, this step is easily customizable to suit the exact requirements you have. E.g., if you want the dates to span from first product entry to today, you can just set start
and end
manually.
from itertools import chain, cycle
date_ranges = df.groupby('pid').agg({'date': ['min', 'max']})
pairs = (zip(cycle([pid]), pd.date_range(start, end))
for pid, (start, end) in date_ranges.iterrows())
new_index = pd.Index(chain.from_iterable(pairs), name=['pid', 'date'])
Then we apply the new index. Here we have two options:
delta_qty
with 0
and the remaining columns per the last update (this is deviating from your request, but seemed logical and is only a minor change)In either case, the two essential concepts are the .reindex
method and the .fillna
method. We can use reindex
to expand the dense DataFrame
to include all the dates but have sparse data. Then, we fill in the nan
s with the proper data. Since we're forward-padding from the last update, we want to specify method='ffill'
per the docs
# this fills the rows per last update
results = df.set_index(['pid', 'date'])\
.reindex(new_index).reset_index()
results.fillna(method='ffill', inplace=True)
This returns
pid date timestamp delta_qty resulting_qty
0 A 2017-03-06 2017-03-06 12:24:22 0.0 0.0
1 A 2017-03-07 2017-03-06 12:24:22 0.0 0.0
2 A 2017-03-08 2017-03-06 12:24:22 0.0 0.0
3 A 2017-03-09 2017-03-06 12:24:22 0.0 0.0
.. .. ... ... ... ...
24 A 2017-03-30 2017-03-06 12:24:22 0.0 0.0
25 A 2017-03-31 2017-03-31 02:43:11 3.0 3.0
.. .. ... ... ... ...
29 A 2017-04-04 2017-03-31 02:43:11 3.0 3.0
for pid == 'A'
results = df.set_index(['pid', 'date'])\
.reindex(new_index).reset_index()
results['delta_qty'].fillna(0, inplace=True)
results.fillna(method='ffill', inplace=True)
This returns:
pid date timestamp delta_qty resulting_qty
0 A 2017-03-06 2017-03-06 12:24:22 0.0 0.0
1 A 2017-03-07 2017-03-06 12:24:22 0.0 0.0
2 A 2017-03-08 2017-03-06 12:24:22 0.0 0.0
3 A 2017-03-09 2017-03-06 12:24:22 0.0 0.0
.. .. ... ... ... ...
24 A 2017-03-30 2017-03-06 12:24:22 0.0 0.0
25 A 2017-03-31 2017-03-31 02:43:11 3.0 3.0
.. .. ... ... ... ...
29 A 2017-04-04 2017-03-31 02:43:11 0.0 3.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With