I have an input dataframe(ip_df), data in this dataframe looks like as below: <pre class="prettyprint"><code>id col_value 1 10 2 11 3 12 </code></pre> Data type of id and col_value is String I need to get another dataframe(output_df), having datatype of id as string and col_value column as decimal**(15,4)**. THere is no data transformation, just data type conversion. Can i use it using PySpark. Any help will be appreciated

Try using the cast method: <pre class="prettyprint"><code>from pyspark.sql.types import DecimalType <your code> output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast(DecimalType())) </code></pre>

Change the Datatype of columns in PySpark dataframe

Tags:

apache-spark

pyspark

spark-dataframe

I have an input dataframe(ip_df), data in this dataframe looks like as below:

id            col_value
1               10
2               11
3               12

Data type of id and col_value is String

I need to get another dataframe(output_df), having datatype of id as string and col_value column as decimal**(15,4)**. THere is no data transformation, just data type conversion. Can i use it using PySpark. Any help will be appreciated

928

asked Aug 02 '17 06:08

Arunanshu P

1 Answers

Try using the cast method:

from pyspark.sql.types import DecimalType
<your code>
output_df = ip_df.withColumn("col_value",ip_df["col_value"].cast(DecimalType()))

200

answered Nov 15 '22 10:11

aclowkay

Related questions
                            
                                Spark on embedded mode - user/hive/warehouse not found
                            
                                What happens if an RDD can't fit into memory in Spark? [duplicate]
                            
                                How to upload files to new EMR cluster
                            
                                pyspark split a column to multiple columns without pandas
                            
                                spark.storage.memoryFraction setting in Apache Spark
                            
                                spark returns error libsnappyjava.so: failed to map segment from shared object: Operation not permitted
                            
                                How to convert a sparse vector to dense in Scala Spark?
                            
                                Spark looses all executors one minute after starting
                            
                                how to obtain the trained best model from a crossvalidator
                            
                                spark group multiple rdd items by key
                            
                                no valid constructor on spark
                            
                                Many skipped stages for Pregel in Spark UI
                            
                                Can you copy straight from Parquet/S3 to Redshift using Spark SQL/Hive/Presto?
                            
                                What's the performance impact of converting between `DataFrame`, `RDD` and back?
                            
                                Spark submit YARN mode HADOOP_CONF_DIR contents
                            
                                apache spark master ui not working
                            
                                spark "basePath" option setting
                            
                                Access names of fields in struct Spark SQL
                            
                                Spark SQL's Scala API - TimestampType - No Encoder found for org.apache.spark.sql.types.TimestampType
                            
                                Spark dataframe add a row for every existing row