How do I make a Categorical column which has:
['a', 'b', 'a', 'a']['a', 'b', 'c']in polars?
In pandas, I would do:
In [31]: pd.Series(pd.Categorical(['a', 'b', 'a', 'a'], categories=['a', 'b', 'c']))
Out[31]:
0 a
1 b
2 a
3 a
dtype: category
Categories (3, object): ['a', 'b', 'c']
I have no idea how to do this in polars, the docs for Categorical look completely empty:
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Categorical.html
You can use the StringCache
with pl.StringCache():
pl.Series(['a', 'b', 'c'], dtype=pl.Categorical())
s = pl.Series(['a', 'b', 'a', 'a','z'], dtype=pl.Categorical())
Everything in the StringCache context will share the same index/value mapping so the first line initialized the mapping with the categories you want. The second line is the Series you want to keep. I added an extra 'z' so that we can see:
s.to_physical()
shape: (5,)
Series: '' [u32]
[
0
1
0
0
3
]
Note that the s series skips 2 as it doesn't have a c value in it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With