Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polars: Casting a Column to Decimal

I am trying to read a flat file, assign column names to the dataframe and cast few columns as per my requirements. However while casting a column to Decimal gives me error in Polars. I implemented the same successfully in Spark, but need help if anyone can guide me how to do the same in Polars.

sample data

data = b"""
D|120277|Patricia|167.2|26/12/1982
D|757773|Charles|167.66|08/04/2019 
D|248498|Katrina|168.68|20/11/2016 
D|325561|Christina|170.86|05/10/1998 
D|697464|Joshua|171.41|07/09/1970 
D|244750|Zachary|169.43|08/12/2014
""".strip()

Polars Script

import polars as pl

cols_dict = {'column_1': 'rtype', 'column_2': 'EMP_ID', 'column_3': 'EMP_NAME', 'column_4': 'SALARY', 'column_5': 'DOB'}

df = pl.read_csv(data, separator='|', has_header=False)
df = df.select(pl.all().name.map(lambda col_name: cols_dict.get(col_name)))

df = df.with_columns(
pl.col('EMP_ID').cast(pl.Decimal(scale=6, precision=0)))

It throws me exception:

InvalidOperationError: conversion from i64 to decimal[1,6] failed in column 'EMP_ID' for 5 out of 5 values: [120277, 757773, … 697464]

Spark Script:

rdd = sparkContext.textFile(core_data_file).filter(lambda x: x\[0\] == "D").map(
lambda x: x.split('|'))

sparkDf= rdd.toDF(schema=\["rtype"\] + list(cols_dict .values()))
sparkDf= sparkDf.withColumn(col_name, coalesce(
col(col_name).cast(DecimalType(data_length, data_scale)), lit(0)))
sarkDf.show()
+-----+------+---------+------+-------------------+
|rtype|EMP_ID|EMP_NAME |SALARY|DOB                |
+-----+------+---------+------+-------------------+
|D    |120277|Patricia |167.20|1982-12-26 00:00:00|
|D    |757773|Charles  |167.66|2019-04-08 00:00:00|
|D    |248498|Katrina  |168.68|2016-11-20 00:00:00|
|D    |325561|Christina|170.86|1998-10-05 00:00:00|
|D    |697464|Joshua   |171.41|1970-09-07 00:00:00|
|D    |244750|Zachary  |169.43|2014-12-08 00:00:00|
+-----+------+---------+------+-------------------+
like image 869
Smruti Prakash Mohanty Avatar asked Oct 25 '25 05:10

Smruti Prakash Mohanty


1 Answers

In polars precision and scale are reversed from most other SQL-like definitions.

In your Spark code, the DecimalType is expecting a precision then scale.

col(col_name).cast(DecimalType(data_length, data_scale))

However, because the polars Decimal has a default of None for precision, and no default for scale, the order is reversed.

# A decimal[6,0]
pl.Decimal(0, 6)
pl.Decimal(scale=0, precision=6)

If the decimal configuration is enabled, and precision and scale are set appropriately, the cast will work.

pl.Config(activate_decimals = True)
df = df.with_columns(pl.col("EMP_ID").cast(pl.Decimal(precision=6, scale=0)))

print(df)
shape: (6, 5)
┌───────┬──────────────┬───────────┬────────┬─────────────┐
│ rtype ┆ EMP_ID       ┆ EMP_NAME  ┆ SALARY ┆ DOB         │
│ ---   ┆ ---          ┆ ---       ┆ ---    ┆ ---         │
│ str   ┆ decimal[6,0] ┆ str       ┆ f64    ┆ str         │
╞═══════╪══════════════╪═══════════╪════════╪═════════════╡
│ D     ┆ 120277       ┆ Patricia  ┆ 167.2  ┆ 26/12/1982  │
│ D     ┆ 757773       ┆ Charles   ┆ 167.66 ┆ 08/04/2019  │
│ D     ┆ 248498       ┆ Katrina   ┆ 168.68 ┆ 20/11/2016  │
│ D     ┆ 325561       ┆ Christina ┆ 170.86 ┆ 05/10/1998  │
│ D     ┆ 697464       ┆ Joshua    ┆ 171.41 ┆ 07/09/1970  │
│ D     ┆ 244750       ┆ Zachary   ┆ 169.43 ┆ 08/12/2014  │
└───────┴──────────────┴───────────┴────────┴─────────────┘
like image 191
keraion Avatar answered Oct 26 '25 17:10

keraion