Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make a categorical column which has categories ['a', 'b', 'c'] in Polars

How do I make a Categorical column which has:

  • elements: ['a', 'b', 'a', 'a']
  • categories ['a', 'b', 'c']

in polars?

In pandas, I would do:

In [31]: pd.Series(pd.Categorical(['a', 'b', 'a', 'a'], categories=['a', 'b', 'c']))
Out[31]:
0    a
1    b
2    a
3    a
dtype: category
Categories (3, object): ['a', 'b', 'c']

I have no idea how to do this in polars, the docs for Categorical look completely empty: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Categorical.html

like image 377
ignoring_gravity Avatar asked Oct 30 '25 14:10

ignoring_gravity


1 Answers

You can use the StringCache

with pl.StringCache():
    pl.Series(['a', 'b', 'c'], dtype=pl.Categorical())
    s = pl.Series(['a', 'b', 'a', 'a','z'], dtype=pl.Categorical())

Everything in the StringCache context will share the same index/value mapping so the first line initialized the mapping with the categories you want. The second line is the Series you want to keep. I added an extra 'z' so that we can see:

s.to_physical()
shape: (5,)
Series: '' [u32]
[
    0
    1
    0
    0
    3
]

Note that the s series skips 2 as it doesn't have a c value in it.

like image 194
Dean MacGregor Avatar answered Nov 02 '25 05:11

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!