In pandas, the following code will split the string from col1 into many columns. is there a way to do this in polars?
data = {"col1": ["a/b/c/d", "a/b/c/d"]}
df = pl.DataFrame(data)
df_pd = df.to_pandas()
df_pd[["a", "b", "c", "d"]] = df_pd["col1"].str.split("/", expand=True)
pl.from_pandas(df_pd)
shape: (2, 5)
┌─────────┬─────┬─────┬─────┬─────┐
│ col1 ┆ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str │
╞═════════╪═════╪═════╪═════╪═════╡
│ a/b/c/d ┆ a ┆ b ┆ c ┆ d │
│ a/b/c/d ┆ a ┆ b ┆ c ┆ d │
└─────────┴─────┴─────┴─────┴─────┘
You can convert to a struct datatype.
.list.to_struct()import polars as pl
df = pl.DataFrame({
"my_str": ["cat", "cat/dog", None, "", "cat/dog/aardvark/mouse/frog"],
})
df.select(pl.col("my_str").str.split("/")
.list.to_struct(n_field_strategy="max_width")).unnest("my_str")
Notice you must use n_field_strategy="max_width", otherwise, unnest() will create only 1 column.
Update: for polars >= v1.33 n_field_strategy is deprecated and you must either set fields as a sequence or upper_bound instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With