I am trying to filter my Polars DataFrame for dates that are nearest to a given date.
For example:
import polars
import datetime
data = {
    "date": ["2025-01-01", "2025-01-01", "2025-01-01", "2026-01-01"],
    "value": [1, 2, 3, 4],
}
df = polars.DataFrame(data).with_columns([polars.col("date").cast(polars.Date)])
shape: (4, 2)
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1     │
│ 2025-01-01 ┆ 2     │
│ 2025-01-01 ┆ 3     │
│ 2026-01-01 ┆ 4     │
└────────────┴───────┘
Given a date, say:
date = datetime.date(2024, 12, 31)
I want to filter the DataFrame for rows where the date column only includes records that are closest to my required date.
I know that I can do the following:
result = df.with_columns(
    diff=(polars.col("date") - date).abs()
).filter(
    polars.col("diff") == polars.min("diff")
)
shape: (3, 3)
┌────────────┬───────┬──────────────┐
│ date       ┆ value ┆ diff         │
│ ---        ┆ ---   ┆ ---          │
│ date       ┆ i64   ┆ duration[ms] │
╞════════════╪═══════╪══════════════╡
│ 2025-01-01 ┆ 1     ┆ 1d           │
│ 2025-01-01 ┆ 2     ┆ 1d           │
│ 2025-01-01 ┆ 3     ┆ 1d           │
└────────────┴───────┴──────────────┘
Is there a more succinct way to achieve this (without creating a new column, for example)?
You don't need to add the temporary column, just filter directly:
df.filter((m:=(pl.col('date')-date).abs()).min() == m)
Or, without the walrus operator:
diff = (pl.col('date')-date).abs()
df.filter(diff.min() == diff)
Output:
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1     │
│ 2025-01-01 ┆ 2     │
│ 2025-01-01 ┆ 3     │
└────────────┴───────┘
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With