Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you select fields from all structs in a list in Polars?

I'm working with a deeply nested DataFrame (not good practice, I know), and I'd like to express something like "select field X for all structs in list Y".

An example of the data structure:

import polars as pl

data = {
    "a": [
        [{
            "x": [1, 2, 3],
            "y": [4, 5, 6]
        },
        {
            "x": [2, 3, 4],
            "y": [3, 4, 5]
        }
        ]
    ],
}
df = pl.DataFrame(data)

In this case, I'd like to select field "x" in both of the structs, and gather them into a df with two series, call them"x_1" and "x_2".

In other words, the desired output is:

┌───────────┬───────────┐
│ x_1       ┆ x_2       │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2, 3, 4] │
└───────────┴───────────┘

I don't know the length of the list ahead of time, and I'd like to do this dynamically (i.e. without hard-coding the field names). I'm not sure whether this is possible using Polars expressions?

Thanks in advance!

like image 843
Ludde Avatar asked Oct 16 '25 00:10

Ludde


1 Answers

Update: Perhaps a simpler approach using .unstack()

(df.select(pl.col("a").flatten().struct.field("x"))
   .unstack(1)
)
shape: (1, 2)
┌───────────┬───────────┐
│ x_0       ┆ x_1       │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2, 3, 4] │
└───────────┴───────────┘

Original answer:

df.select(
   pl.col("a").list.eval(pl.element().struct["x"])
     .list.to_struct("max_width", lambda idx: f"x_{idx + 1}")
).unnest("a")
shape: (1, 2)
┌───────────┬───────────┐
│ x_1       ┆ x_2       │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2, 3, 4] │
└───────────┴───────────┘

Explanation

  • .list.eval() to loop through each list element, we extract each struct field.
df.select(
   pl.col("a").list.eval(pl.element().struct["x"])
)

# shape: (1, 1)
# ┌────────────────────────┐
# │ a                      │
# │ ---                    │
# │ list[list[i64]]        │
# ╞════════════════════════╡
# │ [[1, 2, 3], [2, 3, 4]] │
# └────────────────────────┘
  • .list.to_struct() to convert to a struct which will allow us to turn each inner list into its own column.
df.select(
   pl.col("a").list.eval(pl.element().struct["x"])
     .list.to_struct("max_width", lambda idx: f"x_{idx + 1}")
)

# shape: (1, 1)
# ┌───────────────────────┐
# │ a                     │
# │ ---                   │
# │ struct[2]             │
# ╞═══════════════════════╡
# │ {[1, 2, 3],[2, 3, 4]} │
# └───────────────────────┘
  • .unnest() the struct to create individual columns.
like image 185
jqurious Avatar answered Oct 17 '25 14:10

jqurious



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!