Suppose I have polars Dataframe with a list column type of strings:
┌─────────────────────────────────────────────────┐
│ words │
│ --- │
│ list[str] │
╞═════════════════════════════════════════════════╡
│ ["i", "like", "the", "pizza"] │
│ ["the", "dog", "is", "runnig"] │
│ ["me", "and", "my", "friend", "are", "playing"] │
└─────────────────────────────────────────────────┘
And I would like to filter stop words from every list.
I can apply some custom function using map_elements
:
import polars as pl
pl.Config(fmt_table_cell_list_len=8, fmt_str_lengths=80)
df = pl.DataFrame({
"words": [["i", "like", "the", "pizza"],
["the", "dog", "is", "runnig"],
["me", "and", "my", "friend", "are", "playing"]]
})
STOP_WORDS = ["the"]
filtered_df = df.with_columns(
pl.col("words").map_elements(lambda words:
[word for word in words if word not in STOP_WORDS]
)
)
shape: (3, 1)
┌─────────────────────────────────────────────────┐
│ words │
│ --- │
│ list[str] │
╞═════════════════════════════════════════════════╡
│ ["i", "like", "pizza"] │
│ ["dog", "is", "runnig"] │
│ ["me", "and", "my", "friend", "are", "playing"] │
└─────────────────────────────────────────────────┘
However, it stated in the docs that custom UDFs are much slower, so I prefer native API based solution.
Is there any builtin function in Polars to achieve my goal?
Thanks.
.list.set_difference()
may also be an option.
df.with_columns(
pl.col("words").list.set_difference(STOP_WORDS)
)
shape: (3, 1)
┌─────────────────────────────────────────────────┐
│ words │
│ --- │
│ list[str] │
╞═════════════════════════════════════════════════╡
│ ["i", "like", "pizza"] │
│ ["runnig", "dog", "is"] │
│ ["me", "and", "my", "friend", "are", "playing"] │
└─────────────────────────────────────────────────┘
Do note that the "set" approach also removes duplicates which may or may not be desired.
pl.Series([["a", "a", "a", "b", "c"]]).list.set_difference(["c", "d"])
shape: (1,)
Series: '' [list[str]]
[
["a", "b"]
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With