Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to expand an inventory journal (log) pandas dataframe to include all dates per product id

I have an inventory journal that contains products and their relative inventory qty (resulting_qty) as well as the loss/gain every time inventory is added or subtracted (delta_qty).

The issue is that inventory records do not get updated daily, rather they are only updated when a change in inventory occurs. For this reason, it is difficult to extract the total inventory qty for all items on a given day, because some items are not recorded on certain days, despite the fact that they do have available inventory given their last entry resulting_qty was greater than 0. Logically, this would mean that an item went without a change in qty for a certain amount of days equal to the number of days between the max date and the last recorded date.

my data looks something like this, except in reality there are thousands of product ids

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2017-03-06 | 2017-03-06 12:24:22 | A   | 0         | 0.0           |
| 2017-03-31 | 2017-03-31 02:43:11 | A   | 3         | 3.0           |
| 2017-04-08 | 2017-04-08 22:04:35 | A   | -1        | 2.0           |
| 2017-04-12 | 2017-04-12 18:26:39 | A   | -1        | 1.0           |
| 2017-04-19 | 2017-04-19 09:15:38 | A   | -1        | 0.0           |
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-05-09 16:25:41 | C   | 2         | 2.0           |

Essentially, I need to make the data look something more like this so that I can simply pull a date and get the sum of total inventory for a given day when grouping by date (e.g. df.groupby(date).resulting_qty.sum()):

Note I removed the PID= A rows due to character limitations, but I hope you get the idea:

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-06 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-07 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-08 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-09 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-10 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-11 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-12 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-13 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-14 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-15 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-16 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-17 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-18 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-19 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-20 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-21 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-24 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-25 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-26 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-27 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-28 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-29 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-30 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-01 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-02 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-03 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-04 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-05 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-06 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-07 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-08 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-10 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |

So far what I've done was created a series of loops that generates a date range between the min date of the product lifecycle and the max date of all products. I then append the last recorded row values as a new row with a new date if there is no information for said new date. I append these to lists, and then generate a new dataframe with the updated lists. The code is terribly slow and takes 2+ hours to complete on the total dataset:

date_list = []
pid_list= []
time_stamp_list = []
delta_qty_list = []
resulting_qty_list = []


timer = len(test.product_id.unique().tolist())
counter = 0
for product in test.product_id.unique().tolist():
    counter+=1
    print((counter/timer)*100)
    temp_df = test.query(f'product_id=={product}', engine='python')
    for idx,date in enumerate(pd.date_range(temp_df.index.min(),test.index.max()).tolist()):
        min_date= temp_df.index.min()
        if date.date() == min_date:
            date2=min_date
            pid = temp_df.loc[date2]['product_id']
            timestamp = temp_df.loc[date2]['timestamp']
            delta_qty = temp_df.loc[date2]['delta_qty']
            resulting_qty = temp_df.loc[date2]['resulting_qty']
            date_list.append(date2)
            pid_list.append(pid)
            delta_qty_list.append(delta_qty)
            time_stamp_list.append(timestamp)
            resulting_qty_list.append(resulting_qty)
        else:

            if date.date() in temp_df.index:
                date2= date.date()
                pid = temp_df.loc[date2]['product_id']
                timestamp = temp_df.loc[date2]['timestamp']
                delta_qty = temp_df.loc[date2]['delta_qty']
                resulting_qty = temp_df.loc[date2]['resulting_qty']
                date_list.append(date2)
                pid_list.append(pid)
                delta_qty_list.append(delta_qty)
                time_stamp_list.append(timestamp)
                resulting_qty_list.append(resulting_qty)
            elif date.date() > date2:
                date_list.append(date.date())
                pid_list.append(pid)
                time_stamp_list.append(timestamp)
                delta_qty_list.append(delta_qty)
                resulting_qty_list.append(resulting_qty)
            else:
                pass

Can someone please help me to understand what is the right way I should approach this as I'm 100% sure this is not the best approach.

Thank you

like image 500
Negative Correlation Avatar asked May 12 '19 19:05

Negative Correlation


People also ask

What is Cumprod in pandas?

Pandas DataFrame cumprod() Method The cumprod() method returns a DataFrame with the cumulative product for each row.

Which attributes is used to get the total number of elements in a DataFrame?

The total number of elements of pandas. DataFrame is stored in the size attribute.

How do pandas deal with missing dates?

To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') .

How to store results under an existing column in a Dataframe?

Alternatively, you may store the results under an existing DataFrame column. For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros: You may then apply the following IF conditions, and then store the results under the existing ‘set_of_numbers’ column:

How to apply an IF condition in pandas Dataframe?

Applying an IF condition in Pandas DataFrame. Let’s now review the following 5 cases: (1) IF condition – Set of numbers. Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of ‘True’

Should I use Lambda or Pandas for Dataframe?

You can achieve the same results by using either lambada, or just by sticking with Pandas. At the end, it boils down to working with the method that is best suited to your needs. Finally, you may want to check the following external source for additional information about Pandas DataFrame.

How do I get the set_of_numbers in Python using PANDAS?

For our example, the Python code would look like this: import pandas as pd numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]} df = pd.DataFrame (numbers,columns= ['set_of_numbers']) df.loc


1 Answers

The idea here is to reindex the DataFrame to fill the gaps you have.

Setup a DataFrame generated using your sample:

from io import StringIO

buffer = StringIO()
buffer.write('''\
date|timestamp|pid|delta_qty|resulting_qty
2017-03-06|2017-03-06 12:24:22|A|0|0.0          
2017-03-31|2017-03-31 02:43:11|A|3|3.0          
2017-04-08|2017-04-08 22:04:35|A|-1|2.0          
2017-04-12|2017-04-12 18:26:39|A|-1|1.0          
2017-04-19|2017-04-19 09:15:38|A|-1|0.0          
2019-01-16|2019-01-16 23:37:17|B|0|0.0          
2019-01-19|2019-01-19 09:40:38|C|0|0.0          
2019-04-05|2019-04-05 16:40:32|B|2|2.0          
2019-04-22|2019-04-22 11:06:33|B|-1|1.0          
2019-04-23|2019-04-23 13:23:17|B|-1|0.0          
2019-05-09|2019-05-09 16:25:41|C|2|2.0          
''')
buffer.seek(0)

df = pd.read_csv(buffer, sep='|', parse_dates=['date', 'timestamp'])

First, we generate a new, gap-less index between the min and max dates for each product. This has the effect of having no rows for a product after the last existing update, per your example. However, this step is easily customizable to suit the exact requirements you have. E.g., if you want the dates to span from first product entry to today, you can just set start and end manually.

from itertools import chain, cycle

date_ranges = df.groupby('pid').agg({'date': ['min', 'max']})

pairs = (zip(cycle([pid]), pd.date_range(start, end)) 
         for pid, (start, end) in date_ranges.iterrows())
new_index = pd.Index(chain.from_iterable(pairs), name=['pid', 'date'])

Then we apply the new index. Here we have two options:

  1. As per your example, we continue filling based on the last update exactly as is
  2. Fill delta_qty with 0 and the remaining columns per the last update (this is deviating from your request, but seemed logical and is only a minor change)

In either case, the two essential concepts are the .reindex method and the .fillna method. We can use reindex to expand the dense DataFrame to include all the dates but have sparse data. Then, we fill in the nans with the proper data. Since we're forward-padding from the last update, we want to specify method='ffill' per the docs

Method 1:

# this fills the rows per last update
results = df.set_index(['pid', 'date'])\
    .reindex(new_index).reset_index()
results.fillna(method='ffill', inplace=True)

This returns

    pid       date           timestamp  delta_qty  resulting_qty
0     A 2017-03-06 2017-03-06 12:24:22        0.0            0.0
1     A 2017-03-07 2017-03-06 12:24:22        0.0            0.0
2     A 2017-03-08 2017-03-06 12:24:22        0.0            0.0
3     A 2017-03-09 2017-03-06 12:24:22        0.0            0.0
..   ..        ...                 ...        ...            ...
24    A 2017-03-30 2017-03-06 12:24:22        0.0            0.0
25    A 2017-03-31 2017-03-31 02:43:11        3.0            3.0
..   ..        ...                 ...        ...            ...
29    A 2017-04-04 2017-03-31 02:43:11        3.0            3.0

for pid == 'A'

Method 2:

results = df.set_index(['pid', 'date'])\
    .reindex(new_index).reset_index()
results['delta_qty'].fillna(0, inplace=True)
results.fillna(method='ffill', inplace=True)

This returns:

    pid       date           timestamp  delta_qty  resulting_qty
0     A 2017-03-06 2017-03-06 12:24:22        0.0            0.0
1     A 2017-03-07 2017-03-06 12:24:22        0.0            0.0
2     A 2017-03-08 2017-03-06 12:24:22        0.0            0.0
3     A 2017-03-09 2017-03-06 12:24:22        0.0            0.0
..   ..        ...                 ...        ...            ...
24    A 2017-03-30 2017-03-06 12:24:22        0.0            0.0
25    A 2017-03-31 2017-03-31 02:43:11        3.0            3.0
..   ..        ...                 ...        ...            ...
29    A 2017-04-04 2017-03-31 02:43:11        0.0            3.0
like image 178
onepan Avatar answered Oct 12 '22 20:10

onepan