I am trying to load data into a polars DataFrame using the read_csv command but I keep getting this error
RuntimeError: Any(ComputeError("Could not parse 0.5 as dtype Int64 at column 13.\n The total offset in the file is 11684833 bytes.\n\n Consider running the parser `with_ignore_parser_errors=true`\n or consider adding 0.5 to the `null_values` list."))
While I used the converters argument as follows:
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x)
}
The error still persists. I also tried to use the argument displayed in the error:
ignore_errors=True
The error is still there. What can I do? My issue is not with parsing dates, but rather with parsing numbers. This is what I have for now:
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x)
}
df_file = pl.read_csv(file_to_read, has_headers=True, converters=converters, ignore_errors=True)
Polars doesn't have a converters argument. So that won't work.
It seems that a floating point column is trying to be parsed as integers. You can manually set the dtype to pl.Float64 by passing the column name in schema_overrides:
pl.read_csv(..., schema_overrides = {"foo": pl.Float64})
Or you can increase the infer_schema_length so that polars automatically detects floats (the first 100 rows probably only contain integers).
The default is 100, try increasing it until schema inference correctly detects the floating point column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With