I have a polars DataFrame with a number of Series that look like:
pl.Series(['cow', 'cat', '', 'lobster', ''])
# Series: '' [str]
# [
# "cow"
# "cat"
# ""
# "lobster"
# ""
# ]
and I'd like them to be
pl.Series(['cow', 'cat', None, 'lobster', None])
# Series: '' [str]
# [
# "cow"
# "cat"
# null
# "lobster"
# null
# ]
A simple string replacement gives me an error:
pl.Series(['cow', 'cat', '', 'lobster', '']).str.replace('', None)
ComputeError: value cannot be 'null' in 'replace' expression
What's the idiomatic way of doing this for a Series/DataFrame in polars?
For a single Series, you can use the set method.
s = pl.Series(["cow", "cat", "", "lobster", ""])
s.set(s.str.len_chars() == 0, None)
shape: (5,)
Series: '' [str]
[
"cow"
"cat"
null
"lobster"
null
]
For DataFrames, I would suggest using when/then/otherwise. For example, with this data:
df = pl.DataFrame({
"str1": ["cow", "dog", "", "lobster", ""],
"str2": ["", "apple", "orange", "", "kiwi"],
"str3": ["house", "", "apartment", "condo", ""],
})
shape: (5, 3)
┌─────────┬────────┬───────────┐
│ str1 ┆ str2 ┆ str3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪════════╪═══════════╡
│ cow ┆ ┆ house │
│ dog ┆ apple ┆ │
│ ┆ orange ┆ apartment │
│ lobster ┆ ┆ condo │
│ ┆ kiwi ┆ │
└─────────┴────────┴───────────┘
We can run a replacement on all string columns as follows:
df.with_columns(
pl.when(pl.col(pl.String).str.len_chars() == 0)
.then(None)
.otherwise(pl.col(pl.String))
.name.keep()
)
shape: (5, 3)
┌─────────┬────────┬───────────┐
│ str1 ┆ str2 ┆ str3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪════════╪═══════════╡
│ cow ┆ null ┆ house │
│ dog ┆ apple ┆ null │
│ null ┆ orange ┆ apartment │
│ lobster ┆ null ┆ condo │
│ null ┆ kiwi ┆ null │
└─────────┴────────┴───────────┘
The above should be fairly performant.
If you only want to replace empty strings with null on certain columns, you can provide a list:
only_these = ["str1", "str2"]
df.with_columns(
pl.when(pl.col(only_these).str.len_chars() == 0)
.then(None)
.otherwise(pl.col(only_these))
.name.keep()
)
shape: (5, 3)
┌─────────┬────────┬───────────┐
│ str1 ┆ str2 ┆ str3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪════════╪═══════════╡
│ cow ┆ null ┆ house │
│ dog ┆ apple ┆ │
│ null ┆ orange ┆ apartment │
│ lobster ┆ null ┆ condo │
│ null ┆ kiwi ┆ │
└─────────┴────────┴───────────┘
To replace empty strings in a string series.
series.replace("", None)
To replace empty strings in a specific dataframe column "A".
df.with_columns(pl.col("A").replace("", None))
To replace empty strings in all string columns of a dataframe.
df.with_columns(pl.col(pl.String).replace("", None))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With