Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe: resampling time intervals and dividing values proportionally?

I have the following pandas dataframe:

index start_time end_time amount
foo 2023-03-11 09:45:27 2023-03-11 09:58:39 48
bar 2023-03-11 09:59:00 2023-03-11 010:09:00 20

I'm hoping to split the data by hourly intervals, such that amounts would be:

interval amount
2023-03-11 09:00-10:00 50
2023-03-11 10:00-11:00 18

The logic being:

  • 1 minute (=10%) of bar falls within the 09:00-10:00 interval, so 10% of bar's amount is added to the 09:00-10:00 interval, for 48 + 20*0.1 = 50
  • The remainder of bar's amount (18) is assigned to 10:00-11:00

From what I understand, dataframe's .resample method is best suited for splitting by intervals, however:

  • It seems like resample is intended for individual times, not a start & end time
  • There doesn't seem to be a way to proportionally divide times based on intervals

I'm guessing this is a common need when working with time series in Python. Before I go building out something convoluted...is there any built-in/simple way to tackle it?

like image 879
PlankTon Avatar asked Nov 27 '25 19:11

PlankTon


1 Answers

Here is one convoluted way to answer your interesting question using Pandas Timestamp.floor, Timestamp.ceil, and explode:

import pandas as pd

df = pd.DataFrame(
    {
        "index": ["foo", "bar"],
        "start_time": ["2023-03-11 09:45:27", "2023-03-11 09:59:00"],
        "end_time": ["2023-03-11 09:58:39", "2023-03-11 10:09:00"],
        "amount": [48, 20],
    }
)
for col in ("start_time", "end_time"):
    df[col] = pd.to_datetime(df[col], infer_datetime_format=True)
# Find intervals
df["interval"] = df.apply(
    lambda x: [[x["start_time"].floor("H"), x["start_time"].ceil("H")]]
    if (x["end_time"].ceil("H").hour - x["start_time"].floor("H").hour) == 1
    else [
        [x["start_time"].floor("H"), x["end_time"].floor("H")],
        [x["start_time"].ceil("H"), x["end_time"].ceil("H")],
    ],
    axis=1,
)

# Divide values
df["amount"] = df.apply(
    lambda x: x["amount"]
    if (x["end_time"].ceil("H").hour - x["start_time"].floor("H").hour) == 1
    else [
        int(
            x["amount"]
            * 60
            * (60 - x["start_time"].minute)
            / (x["end_time"] - x["start_time"]).total_seconds()
        ),
        int(
            x["amount"]
            - x["amount"]
            * 60
            * (60 - x["start_time"].minute)
            / (x["end_time"] - x["start_time"]).total_seconds()
        ),
    ],
    axis=1,
)

# Deal with lists of intervals
tmp = df.loc[df["interval"].apply(len) == 2].explode(["interval", "amount"])
other = df.loc[~df.index.isin(tmp.index), :].pipe(
    lambda df_: df_.assign(interval=df_["interval"].apply(lambda x: x[0]))
)
df = pd.concat([other, tmp])

# Compute final values
df = (
    df.assign(interval=df["interval"].apply(tuple))
    .groupby("interval")
    .agg({"amount": sum})
)

Then:

print(df)
# Output
                                           amount
interval
(2023-03-11 09:00:00, 2023-03-11 10:00:00)     50
(2023-03-11 10:00:00, 2023-03-11 11:00:00)     18
like image 189
Laurent Avatar answered Nov 30 '25 07:11

Laurent



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!