Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A scalable way of checking if a string column is contained within another string column in Polars

Is there a scalable way of creating the column B_in_A below that doesn't rely on map_elements?

import polars as pl

df = pl.DataFrame({"A":["foo","bar","foo"],"B":["f","b","s"]})

df = (
    df
    .with_columns(
        pl.struct(["A","B"])
        .map_elements(lambda row: (
            row["B"] in row["A"]
            ), return_dtype=pl.Boolean).alias("B_in_A")
    )
)
print(df)

output is

shape: (3, 3)

┌─────┬─────┬────────┐
│ A   ┆ B   ┆ B_in_A │
│ --- ┆ --- ┆ ---    │
│ str ┆ str ┆ bool   │
╞═════╪═════╪════════╡
│ foo ┆ f   ┆ true   │
│ bar ┆ b   ┆ true   │
│ foo ┆ s   ┆ false  │
└─────┴─────┴────────┘
like image 763
DataJack Avatar asked Nov 18 '25 15:11

DataJack


1 Answers

Use str.contains

df.with_columns(B_in_A=pl.col('A').str.contains(pl.col('B')))
shape: (3, 3)
┌─────┬─────┬────────┐
│ A   ┆ B   ┆ B_in_A │
│ --- ┆ --- ┆ ---    │
│ str ┆ str ┆ bool   │
╞═════╪═════╪════════╡
│ foo ┆ f   ┆ true   │
│ bar ┆ b   ┆ true   │
│ foo ┆ s   ┆ false  │
└─────┴─────┴────────┘
like image 135
Dean MacGregor Avatar answered Nov 21 '25 05:11

Dean MacGregor