Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert time durations to numeric in polars?

Is there any built-in function in polars or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)?

import polars as pl

df = pl.DataFrame({
    "from": ["2023-01-01", "2023-01-02", "2023-01-03"],
    "to": ["2023-01-04", "2023-01-05", "2023-01-06"],
})

My current approach:

# Convert to date and calculate the time difference
df = (
    df.with_columns(
        pl.col("to", "from").str.to_date().name.suffix("_date")
    )
    .with_columns((pl.col("to_date") - pl.col("from_date")).alias("time_diff"))
)

# Convert the time difference to int (in days)
df = df.with_columns(
    ((pl.col("time_diff") / (24 * 60 * 60 * 1000)).cast(pl.Int8)).alias("time_diff_int")
)

Output:

shape: (3, 6)
┌────────────┬────────────┬────────────┬────────────┬──────────────┬───────────────┐
│ from       ┆ to         ┆ to_date    ┆ from_date  ┆ time_diff    ┆ time_diff_int │
│ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---          ┆ ---           │
│ str        ┆ str        ┆ date       ┆ date       ┆ duration[ms] ┆ i8            │
╞════════════╪════════════╪════════════╪════════════╪══════════════╪═══════════════╡
│ 2023-01-01 ┆ 2023-01-04 ┆ 2023-01-04 ┆ 2023-01-01 ┆ 3d           ┆ 3             │
│ 2023-01-02 ┆ 2023-01-05 ┆ 2023-01-05 ┆ 2023-01-02 ┆ 3d           ┆ 3             │
│ 2023-01-03 ┆ 2023-01-06 ┆ 2023-01-06 ┆ 2023-01-03 ┆ 3d           ┆ 3             │
└────────────┴────────────┴────────────┴────────────┴──────────────┴───────────────┘
like image 848
Guz Avatar asked Jan 29 '26 20:01

Guz


2 Answers

The dt accessor lets you obtain individual components, is that what you're looking for?

df.select(
    total_days = pl.col.time_diff.dt.total_days(),
    total_hours = pl.col.time_diff.dt.total_hours(),
    total_minutes = pl.col.time_diff.dt.total_minutes()
)
shape: (3, 3)
┌────────────┬─────────────┬───────────────┐
│ total_days ┆ total_hours ┆ total_minutes │
│ ---        ┆ ---         ┆ ---           │
│ i64        ┆ i64         ┆ i64           │
╞════════════╪═════════════╪═══════════════╡
│ 3          ┆ 72          ┆ 4320          │
│ 3          ┆ 72          ┆ 4320          │
│ 3          ┆ 72          ┆ 4320          │
└────────────┴─────────────┴───────────────┘

docs: Temporal API reference

like image 141
FObersteiner Avatar answered Jan 31 '26 11:01

FObersteiner


The interface has changed since the accepted answer was posted.

Still, the total days of a pl.Duration column can be extracted through using pl.Expr.dt.total_days.

df.with_columns(
    pl.col("time_diff").dt.total_days()
)
shape: (3, 1)
┌───────────┐
│ time_diff │
│ ---       │
│ i64       │
╞═══════════╡
│ 3         │
│ 3         │
│ 3         │
└───────────┘

Similar functions exist in the pl.Expr.dt namespace to extract the total hours, minutes, seconds, etc.

  • pl.Expr.dt.total_hours
  • pl.Expr.dt.total_minutes
  • pl.Expr.dt.total_seconds
like image 42
Hericks Avatar answered Jan 31 '26 10:01

Hericks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!