Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use polars dataframes with scikit-learn?

I'm unable to use polars dataframes with scikit-learn for ML training.

Currently, I'm preprocessing all dataframes in polars and convert them to pandas for model training in order for it to work.

Is there any method to directly use polars dataframes with the scikit-learn API (without converting to pandas first)?

like image 839
Regular Tech Guy Avatar asked Feb 18 '26 04:02

Regular Tech Guy


1 Answers

Since asking the question, scikit-learn 1.4 was released improving compatibility with polars.

For example, see the set_output() method of an instance of the sklearn.compose.ColumnTransformer. It can be used as follows.

We start with some sample data

import polars as pl

df = pl.DataFrame({
    "num": [1, 2, 3],
    "cat": ["a", "b", "c"],
})

and apply the ColumnTransformer as follows.

from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.compose import ColumnTransformer

# create column transformer
transformer = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), ["num"]),
        ("cat", OrdinalEncoder(), ["cat"]),
    ]
)

# enable polars output
transformer.set_output(transform="polars")

# fit and transform polars dataframe
transformer.fit_transform(df)

The output again is a pl.DataFrame object.

shape: (3, 2)
┌───────────┬──────────┐
│ num__num  ┆ cat__cat │
│ ---       ┆ ---      │
│ f64       ┆ f64      │
╞═══════════╪══════════╡
│ -1.224745 ┆ 0.0      │
│ 0.0       ┆ 1.0      │
│ 1.224745  ┆ 2.0      │
└───────────┴──────────┘
like image 96
Hericks Avatar answered Feb 20 '26 18:02

Hericks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!