To search over multiple columns, and create a new column of flag if string found, the following codes work, but is there any compact way inside with_columns()
to achieve the same?
df = pl.DataFrame({
"col1": ["hello", "world", "polars"],
"col2": ["data", "science", "hello"],
"col3": ["test", "string", "match"],
"col4": ["hello", "example", "test"]
})
search_string = "hello"
condition = pl.lit(False)
for col in df.columns:
condition |= pl.col(col).str.contains(search_string)
df = df.with_columns(
condition.alias("string_found") + 0
)
print(df)
shape: (3, 5)
┌────────┬─────────┬────────┬─────────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ string_found │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ i32 │
╞════════╪═════════╪════════╪═════════╪══════════════╡
│ hello ┆ data ┆ test ┆ hello ┆ 1 │
│ world ┆ science ┆ string ┆ example ┆ 0 │
│ polars ┆ hello ┆ match ┆ test ┆ 1 │
└────────┴─────────┴────────┴─────────┴──────────────┘
You can use .any_horizontal()
df.with_columns(
pl.any_horizontal(pl.all().str.contains(search_string))
.alias("string_found")
)
shape: (3, 5)
┌────────┬─────────┬────────┬─────────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ string_found │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ bool │
╞════════╪═════════╪════════╪═════════╪══════════════╡
│ hello ┆ data ┆ test ┆ hello ┆ true │
│ world ┆ science ┆ string ┆ example ┆ false │
│ polars ┆ hello ┆ match ┆ test ┆ true │
└────────┴─────────┴────────┴─────────┴──────────────┘
You can replace pl.all()
with pl.col(pl.String)
to limit the expression to String columns only.
In this example you only have String columns so it doesn't come into play.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With