Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polars: change a value in a dataframe if a condition is met in another column

I have this dataframe

import polars as pl

df = pl.from_repr("""
┌─────┬───────┐
│ one ┆ two   │
│ --- ┆ ---   │
│ str ┆ str   │
╞═════╪═══════╡
│ a   ┆ hola  │
│ b   ┆ world │
└─────┴───────┘
""")

And I want to change hola for hello:

shape: (2, 2)
┌─────┬───────┐
│ one ┆ two   │
│ --- ┆ ---   │
│ str ┆ str   │
╞═════╪═══════╡
│ a   ┆ hello │ # <-
│ b   ┆ world │
└─────┴───────┘

How can I change the values of a row based on a condition in another column?

For instance, with PostgreSQL I could do this:

UPDATE my_table SET two = 'hello' WHERE one = 'a';

Or in Spark

my_table.withColumn("two", when(col("one") == "a", "hello"))

I've tried using with_columns(pl.when(pl.col("one") == "a").then("hello")) but that changes the column "one".

EDIT: I could create a SQL instance and plot my way through via SQL but there must be way to achieve this via the Python API.

like image 786
user18140022 Avatar asked Dec 30 '25 05:12

user18140022


2 Answers

You were really close with with_columns(pl.when(pl.col("one") == "a").then("hello")) but you needed to tell it which column that should be.

When you don't tell it which column you're referring to then it has to guess and in this case it guessed the column you referred to.

Instead you do

(df 
    .with_columns(
        two=pl.when(pl.col('one')=='a')
                .then(pl.lit('hello'))
                .otherwise(pl.col('two')))
)

This uses the **kwargs input of with_columns to allow the column named to be on the left of an equal sign as though it were a parameter to a function. You can also use alias syntax like this...

(df 
    .with_columns(
        (pl.when(pl.col('one')=='a')
                .then(pl.lit('hello'))
                .otherwise(pl.col('two')))
            .alias('two')
                )
)

Note that I wrapped the entire when/then/otherwise in parenthesis. The order of operations around when/then/otherwise and alias is weird so I find it's best to always completely wrap them in parenthesis to avoid unexpected results. Worst case scenario is you have redundant parenthesis which doesn't hurt anything.

like image 135
Dean MacGregor Avatar answered Jan 01 '26 19:01

Dean MacGregor


You need to name the expression to overwrite the existing two column, either using .alias() or a named arg .with_columns(two = ...)

You also need to provide .otherwise() as the default is .otherwise(None).

df = pl.DataFrame({"one": ["a", "b"], "two": ["hola", "world"]})

df.with_columns(
   pl.when(pl.col.one == "a")
     .then(pl.lit("hello"))
     .otherwise(pl.col.two)
     .alias("two")
)
shape: (2, 2)
┌─────┬───────┐
│ one ┆ two   │
│ --- ┆ ---   │
│ str ┆ str   │
╞═════╪═══════╡
│ a   ┆ hello │
│ b   ┆ world │
└─────┴───────┘

The .then() branch generates the name, so another option is to invert the logic and put the column inside .then() which allows polars to infer the name for you.

df.with_columns(
   pl.when(pl.col.one != "a")
     .then(pl.col.two)
     .otherwise(pl.lit("hello"))
)

shape: (2, 2)
┌─────┬───────┐
│ one ┆ two   │
│ --- ┆ ---   │
│ str ┆ str   │
╞═════╪═══════╡
│ a   ┆ hello │
│ b   ┆ world │
└─────┴───────┘
like image 45
jqurious Avatar answered Jan 01 '26 20:01

jqurious



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!