Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare 2 tables in Polars and select a value based on that comparison

I have a table like this in polars:

df_a = pl.from_repr("""
┌─────────────────────┬───────┐
│ arrival_time        ┆ Train │
│ ---                 ┆ ---   │
│ datetime[ns]        ┆ i64   │
╞═════════════════════╪═══════╡
│ 2024-10-04 08:40:10 ┆ 112   │
│ 2024-10-04 19:31:26 ┆ 134   │
└─────────────────────┴───────┘
""")

An I have another table that defines the period of the day based on the hours:

df_b = pl.from_repr("""
┌─────────────────────┬───────────┐
│ Time                ┆ Period    │
│ ---                 ┆ ---       │
│ datetime[ns]        ┆ str       │
╞═════════════════════╪═══════════╡
│ 2024-10-04 08:00:00 ┆ Early     │
│ 2024-10-04 16:00:00 ┆ Afternoon │
└─────────────────────┴───────────┘
""")

What I am trying to achieve is a Polars way to combine both tables and obtain in the first table the Period of the day as a new column:

┌─────────────────────┬───────┬───────────┐
│ arrival_time        ┆ Train ┆ Period    │
│ ---                 ┆ ---   ┆ ---       │
│ datetime[ns]        ┆ i64   ┆ str       │
╞═════════════════════╪═══════╪═══════════╡
│ 2024-10-04 08:40:10 ┆ 112   ┆ Early     │
│ 2024-10-04 19:31:26 ┆ 134   ┆ Afternoon │
└─────────────────────┴───────┴───────────┘

For now what I am doing is working entirely with dictionaries, zipping the 2 columns of my comparison table and computing the key of the minimium distance beetween the 2 time columns:

min(dict(zip( df.Period,df.Time)).items(), key=lambda x: abs(pl.col('arrival_time') - x[1]))[0])

But i am certainly sure that there's a better way to process in Polars.

like image 436
luis.martinez.pro Avatar asked Oct 29 '25 09:10

luis.martinez.pro


1 Answers

Polars has join_asof which joins to the closest key forward or backward in time.


from datetime import time

df_a = pl.DataFrame({
    "arrival_time": [time(8, 40, 10), time(19, 31, 26)],
    "train": [112, 134]
})

df_b = pl.DataFrame({
    "arrival_time": [time(8), time(16)],
    "period": ["early", "afternoon"]
    
})

print(df_a.join_asof(df_b, on="arrival_time"))
shape: (2, 3)
┌──────────────┬───────┬───────────┐
│ arrival_time ┆ train ┆ period    │
│ ---          ┆ ---   ┆ ---       │
│ time         ┆ i64   ┆ str       │
╞══════════════╪═══════╪═══════════╡
│ 08:40:10     ┆ 112   ┆ early     │
│ 19:31:26     ┆ 134   ┆ afternoon │
└──────────────┴───────┴───────────┘

like image 200
ritchie46 Avatar answered Oct 31 '25 00:10

ritchie46



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!