How to distribute pandas dataframe rows unevenly across timestamps based on value of the column

Question

E.g. DF which contains number of executions across timestamps.

             DateTime           Execution
0 2023-04-03 07:00:00                   11
1 2023-04-03 11:00:00                   1
2 2023-04-03 12:00:00                   1
3 2023-04-03 14:00:00                   3
4 2023-04-03 18:00:00                   1

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5080 entries, 0 to 5079

Below is the output I'm trying to achieve

             DateTime           Execution
0 2023-04-03 07:00:00                   4
1 2023-04-03 08:00:00                   4
2 2023-04-03 09:00:00                   3
3 2023-04-03 11:00:00                   1
4 2023-04-03 12:00:00                   1
5 2023-04-03 14:00:00                   3
6 2023-04-03 18:00:00                   1

Only if the execution is more than 4, it should be distributed to the next hours. Maximum for any hour is 4.

Thanks again for quick help.

How to distribute pandas dataframe rows evenly across timestamps based on value of the column

This helps with Evenly distribution, I'm looking at uneven distribution.

Timeless · Accepted Answer

With asfreq/clip :

N, C = 4, "Execution"

asfreq = df.set_index("DateTime").asfreq("h")

out = (
    (gby:=asfreq.groupby(asfreq[C].notna().cumsum()))[C]
    .transform("first")
    .sub(gby.cumcount() * N)
    .clip(upper=N)
    .loc[lambda s: s.gt(0)]
    .reset_index(name=C)
    .convert_dtypes()
)

Output :

             DateTime  Execution
0 2023-04-03 07:00:00          4
1 2023-04-03 08:00:00          4
2 2023-04-03 09:00:00          3
3 2023-04-03 11:00:00          1
4 2023-04-03 12:00:00          1
5 2023-04-03 14:00:00          3
6 2023-04-03 18:00:00          1

Ken Jiiii · Answer

Closer to jezrael's solution from the link above:

import pandas as pd
import datetime

values = [11, 2, 4, 1, 1, 1, 1, 1]
datetimes = [
    datetime.datetime.now() + datetime.timedelta(hours=i * 3)
    for i in range(len(values))
]

df = pd.DataFrame(
    {
        "DateTime": datetimes,
        "Execution": values,
    }
)


boundary = 4
div, mod = df.Execution.divmod(boundary)
num_repeat = div * (mod > 0)

new = df.loc[df.index.repeat(num_repeat)].assign(
    Execution=lambda x: ((x.groupby(level=0).cumcount() + 1) < div[x.index]).apply(
        lambda y: boundary if y else mod[x.index].drop_duplicates()[0]
    ),
    DateTime=lambda x: x["DateTime"]
    + pd.to_timedelta(x.groupby(level=0).cumcount() + 1, unit="h"),
)

df_modified = df.copy()
df_modified.loc[df_modified.Execution > 4, "Execution"] = 4

res = pd.concat([df_modified, new]).sort_values("DateTime").reset_index(drop=True)

How to distribute pandas dataframe rows unevenly across timestamps based on value of the column

Tags:

python

pandas

dataframe

GKAK

2 Answers

Timeless

Ken Jiiii

Recent Activity

Donate For Us

How to distribute pandas dataframe rows unevenly across timestamps based on value of the column

Tags:

python

pandas

dataframe

GKAK

2 Answers

Timeless

Ken Jiiii

Related questions

Recent Activity

Donate For Us