how to cast all columns of dataframe to string

2 Answers

Just:

from pyspark.sql.functions import col

table = spark.sql("table")

table.select([col(c).cast("string") for c in table.columns])

answered Oct 27 '22 18:10

user7526416

Here's a one line solution in Scala :

df.select(df.columns.map(c => col(c).cast(StringType)) : _*)

Let's see an example here :

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val data = Seq(
   Row(1, "a"),
   Row(5, "z")
)

val schema = StructType(
  List(
    StructField("num", IntegerType, true),
    StructField("letter", StringType, true)
 )
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  schema
)

df.printSchema
//root
//|-- num: integer (nullable = true)
//|-- letter: string (nullable = true)

val newDf = df.select(df.columns.map(c => col(c).cast(StringType)) : _*)

newDf.printSchema
//root
//|-- num: string (nullable = true)
//|-- letter: string (nullable = true)

I hope it helps

answered Oct 27 '22 19:10

mahmoud mehdi

Related questions
                            
                                Is there a way to filter a field not containing something in a spark dataframe using scala?
                            
                                Spark SQL change format of the number
                            
                                key not found: _PYSPARK_DRIVER_CALLBACK_HOST
                            
                                Error while using Hive context in spark : object hive is not a member of package org.apache.spark.sql
                            
                                Scala/Spark version compatibility
                            
                                Selecting only numeric/string columns names from a Spark DF in pyspark
                            
                                How to allocate more executors per worker in Standalone cluster mode?
                            
                                PySpark - Adding a Column from a list of values using a UDF
                            
                                spark partition data writing by timestamp
                            
                                Invalid Spark URL in local spark session
                            
                                UnsatisfiedLinkError: no snappyjava in java.library.path when running Spark MLLib Unit test within Intellij
                            
                                How can I efficiently read multiple json files into a Dataframe or JavaRDD?
                            
                                spark error RDD type not found when creating RDD
                            
                                What is the best way to define custom methods on a DataFrame?
                            
                                java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
                            
                                Apply same function to all fields of spark dataframe row
                            
                                Pyspark: Replacing value in a column by searching a dictionary
                            
                                pyspark and HDFS commands
                            
                                Making histogram with Spark DataFrame column
                            
                                Keep only duplicates from a DataFrame regarding some field

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to cast all columns of dataframe to string

Tags:

apache-spark

apache-spark-sql

pyspark

user1411335

People also ask

2 Answers

user7526416

mahmoud mehdi

Recent Activity

Donate For Us