Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In polars, can I create a categorical type with levels myself?

In Pandas, I can specify the levels of a Categorical type myself:

MyCat = pd.CategoricalDtype(categories=['A','B','C'], ordered=True)
my_data = pd.Series(['A','A','B'], dtype=MyCat)

This means that

  1. I can make sure that different columns and sets use the same dtype
  2. I can specify an ordering for the levels.

Is there a way to do this with Polars? I know you can use the string cache feature to achieve 1) in a different way, however I'm interested if my dtype/levels can be specified directly. I'm not aware of any way to achieve 2), however I think the categorical dtypes in Arrow do allow an optional ordering, so maybe it's possible?

like image 397
Jarrad Avatar asked Sep 01 '25 22:09

Jarrad


1 Answers

EDIT 2024-02-29:

This answer is outdated. You should use Polars Enum type for this.

Old answer

Not directly, but we can influence how the global string cache is filled. The global string cache simply increments a counter for every new category added.

So if we start with an empty cache and we do a pre-fill in the order that we think is important, the later categories use the cached integer.

Here is an example:

import string
import polars as pl

with pl.StringCache():
    # the first run will fill the global string cache counting from 0..25
    # for all 26 letters in the alphabet
    pl.Series(list(string.ascii_uppercase)).cast(pl.Categorical)
    
    # now the global string cache is populated with all categories
    # we cast the string columns
    df = (
        pl.DataFrame({
            "letters": ["A", "B", "D"],
            "more_letters": ["Z", "B", "J"]
        })
        .with_columns(pl.col(pl.String).cast(pl.Categorical))
        .with_columns(pl.col(pl.Categorical).to_physical().name.suffix("_real_category"))
    )

print(df)
shape: (3, 4)
┌─────────┬──────────────┬───────────────────────┬────────────────────────────┐
│ letters ┆ more_letters ┆ letters_real_category ┆ more_letters_real_category │
│ ---     ┆ ---          ┆ ---                   ┆ ---                        │
│ cat     ┆ cat          ┆ u32                   ┆ u32                        │
╞═════════╪══════════════╪═══════════════════════╪════════════════════════════╡
│ A       ┆ Z            ┆ 0                     ┆ 25                         │
│ B       ┆ B            ┆ 1                     ┆ 1                          │
│ D       ┆ J            ┆ 3                     ┆ 9                          │
└─────────┴──────────────┴───────────────────────┴────────────────────────────┘
like image 125
ritchie46 Avatar answered Sep 03 '25 10:09

ritchie46