I'm unable to use polars dataframes with scikit-learn for ML training.
Currently, I'm preprocessing all dataframes in polars and convert them to pandas for model training in order for it to work.
Is there any method to directly use polars dataframes with the scikit-learn API (without converting to pandas first)?
Since asking the question, scikit-learn 1.4 was released improving compatibility with polars.
For example, see the set_output() method of an instance of the sklearn.compose.ColumnTransformer. It can be used as follows.
We start with some sample data
import polars as pl
df = pl.DataFrame({
"num": [1, 2, 3],
"cat": ["a", "b", "c"],
})
and apply the ColumnTransformer as follows.
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.compose import ColumnTransformer
# create column transformer
transformer = ColumnTransformer(
transformers=[
("num", StandardScaler(), ["num"]),
("cat", OrdinalEncoder(), ["cat"]),
]
)
# enable polars output
transformer.set_output(transform="polars")
# fit and transform polars dataframe
transformer.fit_transform(df)
The output again is a pl.DataFrame object.
shape: (3, 2)
┌───────────┬──────────┐
│ num__num ┆ cat__cat │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═══════════╪══════════╡
│ -1.224745 ┆ 0.0 │
│ 0.0 ┆ 1.0 │
│ 1.224745 ┆ 2.0 │
└───────────┴──────────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With