Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert all column of dataframe to numeric spark scala?

I loaded a csv as dataframe. I would like to cast all columns to float, knowing that the file is to big to write all columns names:

val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val df = spark.read.option("header",true).option("inferSchema", "true").csv("C:/Users/mhattabi/Desktop/dataTest2.csv")
like image 265
user7394882 Avatar asked Feb 05 '23 23:02

user7394882


1 Answers

Given this DataFrame as example:

val df = sqlContext.createDataFrame(Seq(("0", 0),("1", 1),("2", 0))).toDF("id", "c0")

with schema:

StructType(
    StructField(id,StringType,true), 
    StructField(c0,IntegerType,false))

You can loop over DF columns by .columns functions:

val castedDF = df.columns.foldLeft(df)((current, c) => current.withColumn(c, col(c).cast("float")))

So the new DF schema looks like:

StructType(
    StructField(id,FloatType,true), 
    StructField(c0,FloatType,false))

EDIT:

If you wanna exclude some columns from casting, you could do something like (supposing we want to exclude the column id):

val exclude = Array("id")

val someCastedDF = (df.columns.toBuffer --= exclude).foldLeft(df)((current, c) =>
                                              current.withColumn(c, col(c).cast("float")))

where exclude is an Array of all columns we want to exclude from casting.

So the schema of this new DF is:

StructType(
    StructField(id,StringType,true), 
    StructField(c0,FloatType,false))

Please notice that maybe this is not the best solution to do it but it can be a starting point.

like image 62
pheeleeppoo Avatar answered Feb 08 '23 00:02

pheeleeppoo