Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change the Datatype of columns in PySpark dataframe

I have an input dataframe(ip_df), data in this dataframe looks like as below:

id            col_value
1               10
2               11
3               12

Data type of id and col_value is String

I need to get another dataframe(output_df), having datatype of id as string and col_value column as decimal**(15,4)**. THere is no data transformation, just data type conversion. Can i use it using PySpark. Any help will be appreciated

like image 928
Arunanshu P Avatar asked Aug 02 '17 06:08

Arunanshu P


People also ask

How do I change DataType of multiple columns in PySpark DataFrame?

Method 1: Using DataFrame.withColumn() We will make use of cast(x, dataType) method to casts the column to a different data type. Here, the parameter “x” is the column name and dataType is the datatype in which you want to change the respective column to.

How do I change the DataType of a column in Databricks?

You can't rename or change a column datatype in Databricks, only add new columns, reorder them or add column comments. To do this you must rewrite the table using the overwriteSchema option.


1 Answers

Try using the cast method:

from pyspark.sql.types import DecimalType
<your code>
output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast(DecimalType()))
like image 200
aclowkay Avatar answered Nov 15 '22 10:11

aclowkay