Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to destructure nested structs in polars (python api)?

I am unfortunately having to work with some nested data in a polars dataframe. (I know it is bad practice) Consider data:

data = {
    "positions": [
        {
            "company": {
                "companyName": "name1"
            },
        },
        {
            "company": {
                "companyName": "name2"
            },
        },
        {
            "company": {
                "companyName": "name3"
            },
        }
    ]
}

positions is a column in the dataframe. I have explored the polars python api docs but cannot figure out how to extract out just the companyName fields into a separate list column.

I want to achieve the same that this comprehension does:


names = (
    [
        p["company"]["companyName"]
        for p in data["positions"]
        if p.get("company") and p.get("company").get("companyName")
    ]
    if data.get("positions")
    else None
)

Note the null checks.

I get a sense that I have to use the pl.list.eval function along with pl.element but I am a bit foggy on the api.

Before:
shape: (3, 1)
┌─────────────┐
│ positions   │
│ ---         │
│ struct[1]   │
╞═════════════╡
│ {{"name1"}} │
│ {{"name2"}} │
│ {{"name3"}} │
└─────────────┘

After:
shape: (3, 1)
┌───────┐
│ names │
│ ---   │
│ str   │
╞═══════╡
│ name1 │
│ name2 │
│ name3 │
└───────┘
like image 348
Arko Avatar asked Oct 27 '25 20:10

Arko


1 Answers

Structs

You can use .struct.field() or .struct[] syntax to extract struct fields.

  • https://docs.pola.rs/user-guide/expressions/structs/#extracting-individual-values-of-a-struct
df = pl.DataFrame(data)

df.with_columns(
    pl.col("positions").struct["company"].struct["companyName"]
)
shape: (3, 2)
┌─────────────┬─────────────┐
│ positions   ┆ companyName │
│ ---         ┆ ---         │
│ struct[1]   ┆ str         │
╞═════════════╪═════════════╡
│ {{"name1"}} ┆ name1       │
│ {{"name2"}} ┆ name2       │
│ {{"name3"}} ┆ name3       │
└─────────────┴─────────────┘

Alternatively, you can work at the frame-level and .unnest() the structs into columns.

df.unnest("positions").unnest("company")
shape: (3, 1)
┌─────────────┐
│ companyName │
│ ---         │
│ str         │
╞═════════════╡
│ name1       │
│ name2       │
│ name3       │
└─────────────┘

List of structs

If working with a list of structs you could use the .list.eval() API:

df = pl.DataFrame([data])

df.with_columns(
   pl.col("positions").list.eval(
      pl.element().struct["company"].struct["companyName"]
   )
)
shape: (1, 1)
┌─────────────────────────────┐
│ positions                   │
│ ---                         │
│ list[str]                   │
╞═════════════════════════════╡
│ ["name1", "name2", "name3"] │
└─────────────────────────────┘

Or at the frame-level using .explode() and .unnest()

df.explode("positions").unnest("positions").unnest("company")
shape: (3, 1)
┌─────────────┐
│ companyName │
│ ---         │
│ str         │
╞═════════════╡
│ name1       │
│ name2       │
│ name3       │
└─────────────┘
like image 164
jqurious Avatar answered Oct 29 '25 09:10

jqurious



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!