E.g. DF which contains number of executions across timestamps.
DateTime Execution
0 2023-04-03 07:00:00 11
1 2023-04-03 11:00:00 1
2 2023-04-03 12:00:00 1
3 2023-04-03 14:00:00 3
4 2023-04-03 18:00:00 1
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5080 entries, 0 to 5079
Below is the output I'm trying to achieve
DateTime Execution
0 2023-04-03 07:00:00 4
1 2023-04-03 08:00:00 4
2 2023-04-03 09:00:00 3
3 2023-04-03 11:00:00 1
4 2023-04-03 12:00:00 1
5 2023-04-03 14:00:00 3
6 2023-04-03 18:00:00 1
Only if the execution is more than 4, it should be distributed to the next hours. Maximum for any hour is 4.
Thanks again for quick help.
How to distribute pandas dataframe rows evenly across timestamps based on value of the column
This helps with Evenly distribution, I'm looking at uneven distribution.
With asfreq
/clip
:
N, C = 4, "Execution"
asfreq = df.set_index("DateTime").asfreq("h")
out = (
(gby:=asfreq.groupby(asfreq[C].notna().cumsum()))[C]
.transform("first")
.sub(gby.cumcount() * N)
.clip(upper=N)
.loc[lambda s: s.gt(0)]
.reset_index(name=C)
.convert_dtypes()
)
Output :
DateTime Execution
0 2023-04-03 07:00:00 4
1 2023-04-03 08:00:00 4
2 2023-04-03 09:00:00 3
3 2023-04-03 11:00:00 1
4 2023-04-03 12:00:00 1
5 2023-04-03 14:00:00 3
6 2023-04-03 18:00:00 1
Closer to jezrael's solution from the link above:
import pandas as pd
import datetime
values = [11, 2, 4, 1, 1, 1, 1, 1]
datetimes = [
datetime.datetime.now() + datetime.timedelta(hours=i * 3)
for i in range(len(values))
]
df = pd.DataFrame(
{
"DateTime": datetimes,
"Execution": values,
}
)
boundary = 4
div, mod = df.Execution.divmod(boundary)
num_repeat = div * (mod > 0)
new = df.loc[df.index.repeat(num_repeat)].assign(
Execution=lambda x: ((x.groupby(level=0).cumcount() + 1) < div[x.index]).apply(
lambda y: boundary if y else mod[x.index].drop_duplicates()[0]
),
DateTime=lambda x: x["DateTime"]
+ pd.to_timedelta(x.groupby(level=0).cumcount() + 1, unit="h"),
)
df_modified = df.copy()
df_modified.loc[df_modified.Execution > 4, "Execution"] = 4
res = pd.concat([df_modified, new]).sort_values("DateTime").reset_index(drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With