Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract values based on column names and put it in another column in polars?

I would like to fill a value in a column based on another columns' name, in the Polars library from python (I obtained the following DF by exploding my variables' column names):

Input:

df = pl.from_repr("""
┌────────┬─────────┬────────┬─────┬──────────┐
│ Name   ┆ Average ┆ Median ┆ Q1  ┆ Variable │
│ ---    ┆ ---     ┆ ---    ┆ --- ┆ ---      │
│ str    ┆ i64     ┆ i64    ┆ i64 ┆ str      │
╞════════╪═════════╪════════╪═════╪══════════╡
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Average  │
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Median   │
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Q1       │
│ Banana ┆ 1       ┆ 5      ┆ 10  ┆ Average  │
│ Banana ┆ 1       ┆ 5      ┆ 10  ┆ Median   │
│ Banana ┆ 1       ┆ 5      ┆ 10  ┆ Q1       │
└────────┴─────────┴────────┴─────┴──────────┘
""")

Expected output:

shape: (6, 6)
┌────────┬─────────┬────────┬─────┬──────────┬───────┐
│ Name   ┆ Average ┆ Median ┆ Q1  ┆ Variable ┆ value │
│ ---    ┆ ---     ┆ ---    ┆ --- ┆ ---      ┆ ---   │
│ str    ┆ i64     ┆ i64    ┆ i64 ┆ str      ┆ i64   │
╞════════╪═════════╪════════╪═════╪══════════╪═══════╡
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Average  ┆ 2     │
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Median   ┆ 3     │
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Q1       ┆ 4     │
│ Banana ┆ 1       ┆ 5      ┆ 10  ┆ Average  ┆ 1     │
│ Banana ┆ 1       ┆ 5      ┆ 10  ┆ Median   ┆ 5     │
│ Banana ┆ 1       ┆ 5      ┆ 10  ┆ Q1       ┆ 10    │
└────────┴─────────┴────────┴─────┴──────────┴───────┘

I have tried:

df = df.with_columns(value = pl.col(f"{pl.col.variable}"))

But that does not work because polars perceives the argument as a function (?). Does anyone know how to do this?

Note: I have also tried to transpose the dataframe, which, not only was that computationally expensive, also did not work! Because it would transpose the DF into a 5-rows-long DF. What I need is a (Name * Number of Variables)-rows-long DF.

That is, for example, I have 3 different names (say, Apple, Banana, and Dragonfruit), and I have 3 variables (Average, Median, Q1), then my DF should be 9-rows-long!

like image 747
user24758287 Avatar asked Oct 27 '25 13:10

user24758287


2 Answers

You can use

  • when/then() to check whether the value of the column Variable is the same as the column name.
  • coalesce() to choose first non-empty result.
df.with_columns(
    value = pl.coalesce(
        pl.when(pl.col.Variable == col).then(pl.col(col))
        for col in df["Variable"].unique()
    )
)

┌────────┬─────────┬────────┬─────┬──────────┬───────┐
│ Name   ┆ Average ┆ Median ┆ Q1  ┆ Variable ┆ value │
│ ---    ┆ ---     ┆ ---    ┆ --- ┆ ---      ┆ ---   │
│ str    ┆ i64     ┆ i64    ┆ i64 ┆ str      ┆ i64   │
╞════════╪═════════╪════════╪═════╪══════════╪═══════╡
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Average  ┆ 2     │
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Median   ┆ 3     │
│ Apple  ┆ 2       ┆ 3      ┆ 4   ┆ Q1       ┆ 4     │
│ Banana ┆ 3       ┆ 5      ┆ 10  ┆ Average  ┆ 3     │
│ Banana ┆ 3       ┆ 5      ┆ 10  ┆ Median   ┆ 5     │
│ Banana ┆ 3       ┆ 5      ┆ 10  ┆ Q1       ┆ 10    │
└────────┴─────────┴────────┴─────┴──────────┴───────┘
like image 96
Roman Pekar Avatar answered Oct 29 '25 04:10

Roman Pekar


You can leverage this functionality from polars.when() and polars.then() with a mapping.

For instance,

variable_columns = ["Q1", "Average", "Median"]

expressions = [
    pl.when(pl.col("Variable") == col_name).then(pl.col(col_name))
    for col_name in variable_columns
]

combined_expr = pl.fold(
    acc=pl.lit(None),
    function=lambda acc, x: pl.coalesce(acc, x),
    exprs=expressions
)

df = df.with_columns(combined_expr.alias("Value"))
like image 41
hide1nbush Avatar answered Oct 29 '25 03:10

hide1nbush



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!