Is there a possibility in polars to read in csv with german number formatting like it is possible in pandas.read_csv() with the parameters "decimal" and "thousands"
Currently, the Polars read_csv method does not expose those parameters.
However, there is an easy workaround to convert them. For example, with this csv, allow Polars to read the German-formatted numbers as utf8.
import polars as pl
my_csv = b"""col1\tcol2\tcol3
1.234,5\tabc\t1.234.567
9.876\tdef\t3,21
"""
df = pl.read_csv(my_csv, separator="\t")
print(df)
shape: (2, 3)
┌─────────┬──────┬───────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪══════╪═══════════╡
│ 1.234,5 ┆ abc ┆ 1.234.567 │
│ 9.876 ┆ def ┆ 3,21 │
└─────────┴──────┴───────────┘
From here, the conversion is just a few lines of code:
df = df.with_columns(
pl.col("col1", "col3")
.str.replace_all(r"\.", "")
.str.replace(",", ".")
.cast(pl.Float64) # or whatever datatype needed
)
print(df)
shape: (2, 3)
┌────────┬──────┬────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 │
╞════════╪══════╪════════════╡
│ 1234.5 ┆ abc ┆ 1.234567e6 │
│ 9876.0 ┆ def ┆ 3.21 │
└────────┴──────┴────────────┘
Just be careful to apply this logic only to numbers encoded in German locale. It will mangle numbers formatted in other locales.
In the current version of polars (0.20.26), there is a flag for this: decimal_comma
.
Example:
import polars as pl
df = pl.read_csv('foo.csv', decimal_comma=True)
Hint: This doesn't work in combination with the parameter use_pyarrow set to true.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With