How to use polars dataframes with scikit-learn?

Question

I'm unable to use polars dataframes with scikit-learn for ML training.

Currently, I'm preprocessing all dataframes in polars and convert them to pandas for model training in order for it to work.

Is there any method to directly use polars dataframes with the scikit-learn API (without converting to pandas first)?

Hericks · Accepted Answer

Since asking the question, scikit-learn 1.4 was released improving compatibility with polars.

For example, see the set_output() method of an instance of the sklearn.compose.ColumnTransformer. It can be used as follows.

We start with some sample data

import polars as pl

df = pl.DataFrame({
    "num": [1, 2, 3],
    "cat": ["a", "b", "c"],
})

and apply the ColumnTransformer as follows.

from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.compose import ColumnTransformer

# create column transformer
transformer = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), ["num"]),
        ("cat", OrdinalEncoder(), ["cat"]),
    ]
)

# enable polars output
transformer.set_output(transform="polars")

# fit and transform polars dataframe
transformer.fit_transform(df)

The output again is a pl.DataFrame object.

shape: (3, 2)
┌───────────┬──────────┐
│ num__num  ┆ cat__cat │
│ ---       ┆ ---      │
│ f64       ┆ f64      │
╞═══════════╪══════════╡
│ -1.224745 ┆ 0.0      │
│ 0.0       ┆ 1.0      │
│ 1.224745  ┆ 2.0      │
└───────────┴──────────┘

How to use polars dataframes with scikit-learn?

Tags:

python

machine-learning

scikit-learn

python-polars

Regular Tech Guy

1 Answers

Hericks

Recent Activity

Donate For Us

How to use polars dataframes with scikit-learn?

Tags:

python

machine-learning

scikit-learn

python-polars

Regular Tech Guy

1 Answers

Hericks

Related questions

Recent Activity

Donate For Us