I have this code:
import polars as pl
df1 = pl.DataFrame({
    'type':   ['A', 'O', 'B', 'O'],
    'origin': ['EU', 'US', 'US', 'EU'],
    'qty1':   [343,11,22,-5]
})
df2 = pl.DataFrame({
    'type':   ['A', 'O', 'B', 'S'],
    'origin': ['EU', 'US', 'US', 'AS'],
    'qty2':   [-200,-12,-25,8]
})
df1.join(df2, on=['type', 'origin'], how='full')
which gives
┌──────┬────────┬──────┬────────────┬──────────────┬──────┐
│ type ┆ origin ┆ qty1 ┆ type_right ┆ origin_right ┆ qty2 │
│ ---  ┆ ---    ┆ ---  ┆ ---        ┆ ---          ┆ ---  │
│ str  ┆ str    ┆ i64  ┆ str        ┆ str          ┆ i64  │
╞══════╪════════╪══════╪════════════╪══════════════╪══════╡
│ A    ┆ EU     ┆ 343  ┆ A          ┆ EU           ┆ -200 │
│ O    ┆ US     ┆ 11   ┆ O          ┆ US           ┆ -12  │
│ B    ┆ US     ┆ 22   ┆ B          ┆ US           ┆ -25  │
│ null ┆ null   ┆ null ┆ S          ┆ AS           ┆ 8    │
│ O    ┆ EU     ┆ -5   ┆ null       ┆ null         ┆ null │
└──────┴────────┴──────┴────────────┴──────────────┴──────┘
But the output I am after is this:
┌──────┬────────┬──────┬──────┐
│ type ┆ origin ┆ qty1 ┆ qty2 │
│ ---  ┆ ---    ┆ ---  ┆ ---  │
│ str  ┆ str    ┆ i64  ┆ i64  │
╞══════╪════════╪══════╪══════╡
│ A    ┆ EU     ┆ 343  ┆ -200 │
│ O    ┆ US     ┆ 11   ┆ -12  │
│ B    ┆ US     ┆ 22   ┆ -25  │
│ S    ┆ AS     ┆ null ┆ 8    │
│ O    ┆ EU     ┆ -5   ┆ null │
└──────┴────────┴──────┴──────┘
I tried suffix='' via df1.join(df2, on=['type', 'origin'], how='full', suffix=''), but this raises an error:
DuplicateError: unable to hstack, column with name "type" already exists
How can I achieve this?
You are looking for the coalesce parameter. Setting it to True gives the desired result.
df1.join(df2, on=['type', 'origin'], how='full', coalesce=True)
shape: (5, 4)
┌──────┬────────┬──────┬──────┐
│ type ┆ origin ┆ qty1 ┆ qty2 │
│ ---  ┆ ---    ┆ ---  ┆ ---  │
│ str  ┆ str    ┆ i64  ┆ i64  │
╞══════╪════════╪══════╪══════╡
│ A    ┆ EU     ┆ 343  ┆ -200 │
│ O    ┆ US     ┆ 11   ┆ -12  │
│ B    ┆ US     ┆ 22   ┆ -25  │
│ S    ┆ AS     ┆ null ┆ 8    │
│ O    ┆ EU     ┆ -5   ┆ null │
└──────┴────────┴──────┴──────┘
From the documentation of pl.DataFrame.join.
coalesce
Coalescing behavior (merging of join columns).
None: -> join specific.
True: -> Always coalesce join columns.
False: -> Never coalesce join columns.Note that joining on any other expressions than
colwill turn off coalescing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With