Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

polars.read_csv() with german number formatting

Is there a possibility in polars to read in csv with german number formatting like it is possible in pandas.read_csv() with the parameters "decimal" and "thousands"

like image 807
alexp Avatar asked Oct 15 '25 21:10

alexp


2 Answers

Currently, the Polars read_csv method does not expose those parameters.

However, there is an easy workaround to convert them. For example, with this csv, allow Polars to read the German-formatted numbers as utf8.

import polars as pl

my_csv = b"""col1\tcol2\tcol3
1.234,5\tabc\t1.234.567
9.876\tdef\t3,21
"""
df = pl.read_csv(my_csv, separator="\t")
print(df)
shape: (2, 3)
┌─────────┬──────┬───────────┐
│ col1    ┆ col2 ┆ col3      │
│ ---     ┆ ---  ┆ ---       │
│ str     ┆ str  ┆ str       │
╞═════════╪══════╪═══════════╡
│ 1.234,5 ┆ abc  ┆ 1.234.567 │
│ 9.876   ┆ def  ┆ 3,21      │
└─────────┴──────┴───────────┘

From here, the conversion is just a few lines of code:

df = df.with_columns(
    pl.col("col1", "col3")
    .str.replace_all(r"\.", "")
    .str.replace(",", ".")
    .cast(pl.Float64)  # or whatever datatype needed
)
print(df)
shape: (2, 3)
┌────────┬──────┬────────────┐
│ col1   ┆ col2 ┆ col3       │
│ ---    ┆ ---  ┆ ---        │
│ f64    ┆ str  ┆ f64        │
╞════════╪══════╪════════════╡
│ 1234.5 ┆ abc  ┆ 1.234567e6 │
│ 9876.0 ┆ def  ┆ 3.21       │
└────────┴──────┴────────────┘

Just be careful to apply this logic only to numbers encoded in German locale. It will mangle numbers formatted in other locales.

In the current version of polars (0.20.26), there is a flag for this: decimal_comma.

Example:

import polars as pl

df = pl.read_csv('foo.csv', decimal_comma=True)

Hint: This doesn't work in combination with the parameter use_pyarrow set to true.

like image 21
kraego Avatar answered Oct 18 '25 10:10

kraego