Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace value by null in Polars [duplicate]

Given a Polars DataFrame, is there a way to replace a particular value by "null"? For example, if there's a sentinel value like "_UNKNOWN" and I want to make it truly missing in the dataframe instead.


2 Answers

Update: Expr.replace() has also since been added to Polars.

df.with_columns(pl.col(pl.String).replace("_UNKNOWN", None))
shape: (4, 3)
┌──────┬──────┬─────┐
│ A    ┆ B    ┆ C   │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ i64 │
╞══════╪══════╪═════╡
│ a    ┆ null ┆ 1   │
│ b    ┆ d    ┆ 2   │
│ null ┆ e    ┆ 3   │
│ c    ┆ f    ┆ 4   │
└──────┴──────┴─────┘

You can use .when().then().otherwise()

pl.col(pl.String) is used to select all "string columns".

df = pl.DataFrame({
   "A": ["a", "b", "_UNKNOWN", "c"], 
   "B": ["_UNKNOWN", "d", "e", "f"], 
   "C": [1, 2, 3, 4]
})

df.with_columns(
   pl.when(pl.col(pl.String) == "_UNKNOWN")
     .then(None)
     .otherwise(pl.col(pl.String)) # keep original value
     .name.keep()
)
like image 152
jqurious Avatar answered Mar 02 '26 21:03

jqurious


This is really a tweak of @jqurious's answer.

When you do a when and your condition isn't met then the default is null so you can just do:

df.with_columns(
    pl.when(pl.col(pl.String) != "_UNKNOWN")
        .then(pl.col(pl.String)) # keep original value
        .name.keep()
)

If you have multiple null conditions say null_strings=["_UNKNOWN", "UNK", "who_knows"] then you can use a is_in like this:

df.with_columns(
    pl.when(~pl.col(pl.String).is_in(null_strings))
        .then(pl.col(pl.String)) # keep original value
        .name.keep()
)
like image 39
Dean MacGregor Avatar answered Mar 02 '26 19:03

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!