Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to distribute pandas dataframe rows unevenly across timestamps based on value of the column

E.g. DF which contains number of executions across timestamps.

             DateTime           Execution
0 2023-04-03 07:00:00                   11
1 2023-04-03 11:00:00                   1
2 2023-04-03 12:00:00                   1
3 2023-04-03 14:00:00                   3
4 2023-04-03 18:00:00                   1

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5080 entries, 0 to 5079

Below is the output I'm trying to achieve

             DateTime           Execution
0 2023-04-03 07:00:00                   4
1 2023-04-03 08:00:00                   4
2 2023-04-03 09:00:00                   3
3 2023-04-03 11:00:00                   1
4 2023-04-03 12:00:00                   1
5 2023-04-03 14:00:00                   3
6 2023-04-03 18:00:00                   1

Only if the execution is more than 4, it should be distributed to the next hours. Maximum for any hour is 4.

Thanks again for quick help.

How to distribute pandas dataframe rows evenly across timestamps based on value of the column

This helps with Evenly distribution, I'm looking at uneven distribution.

like image 995
GKAK Avatar asked Sep 02 '25 15:09

GKAK


2 Answers

With asfreq/clip :

N, C = 4, "Execution"

asfreq = df.set_index("DateTime").asfreq("h")

out = (
    (gby:=asfreq.groupby(asfreq[C].notna().cumsum()))[C]
    .transform("first")
    .sub(gby.cumcount() * N)
    .clip(upper=N)
    .loc[lambda s: s.gt(0)]
    .reset_index(name=C)
    .convert_dtypes()
)

Output :

             DateTime  Execution
0 2023-04-03 07:00:00          4
1 2023-04-03 08:00:00          4
2 2023-04-03 09:00:00          3
3 2023-04-03 11:00:00          1
4 2023-04-03 12:00:00          1
5 2023-04-03 14:00:00          3
6 2023-04-03 18:00:00          1
like image 156
Timeless Avatar answered Sep 04 '25 14:09

Timeless


Closer to jezrael's solution from the link above:

import pandas as pd
import datetime

values = [11, 2, 4, 1, 1, 1, 1, 1]
datetimes = [
    datetime.datetime.now() + datetime.timedelta(hours=i * 3)
    for i in range(len(values))
]

df = pd.DataFrame(
    {
        "DateTime": datetimes,
        "Execution": values,
    }
)


boundary = 4
div, mod = df.Execution.divmod(boundary)
num_repeat = div * (mod > 0)

new = df.loc[df.index.repeat(num_repeat)].assign(
    Execution=lambda x: ((x.groupby(level=0).cumcount() + 1) < div[x.index]).apply(
        lambda y: boundary if y else mod[x.index].drop_duplicates()[0]
    ),
    DateTime=lambda x: x["DateTime"]
    + pd.to_timedelta(x.groupby(level=0).cumcount() + 1, unit="h"),
)

df_modified = df.copy()
df_modified.loc[df_modified.Execution > 4, "Execution"] = 4

res = pd.concat([df_modified, new]).sort_values("DateTime").reset_index(drop=True)   
like image 30
Ken Jiiii Avatar answered Sep 04 '25 15:09

Ken Jiiii