convert string data in dataframe into double

Question

I have a csv file containing double type.When i load to a dataframe i got this message telling me that the type string is java.lang.String cannot be cast to java.lang.Double although my data are numeric.How do i get a dataframe from this csv file containing double type.how should i modify my code.

import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{ArrayType, DoubleType}
import org.apache.spark.sql.functions.split
import scala.collection.mutable._

object Example extends App {

val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val data=spark.read.csv("C://lpsa.data").toDF("col1","col2","col3","col4","col5","col6","col7","col8","col9")
val data2=data.select("col2","col3","col4","col5","col6","col7")

What sould i make to transform each row in the dataframe into double type? Thanks

zero323 · Accepted Answer

Use select with cast:

import org.apache.spark.sql.functions.col

data.select(Seq("col2", "col3", "col4", "col5", "col6", "col7").map(
  c => col(c).cast("double")
): _*)

or pass schema to the reader:

define the schema:

import org.apache.spark.sql.types._

val cols = Seq(
  "col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9"
)

val doubleCols = Set("col2", "col3", "col4", "col5", "col6", "col7")

val schema =  StructType(cols.map(
  c => StructField(c, if (doubleCols contains c) DoubleType else StringType)
))

and use it as an argument for schema method
```
spark.read.schema(schema).csv(path)
```

It is also possible to use schema inference:

spark.read.option("inferSchema", "true").csv(path)

but it is much more expensive.

Saurabh Singh · Answer

I believe using sparks inferSchema option comes in handy while reading the csv file. Below is the code to automatically detect your columns as double type :

val data = spark.read
                .format("csv")
                .option("header", "false")
                .option("inferSchema", "true")
                .load("C://lpsa.data").toDF()


Note: I am using spark version 2.2.0

convert string data in dataframe into double

Tags:

scala

apache-spark

apache-spark-sql

Hattabi Maher

2 Answers

zero323

Saurabh Singh

Recent Activity

Donate For Us

convert string data in dataframe into double

Tags:

scala

apache-spark

apache-spark-sql

Hattabi Maher

2 Answers

zero323

Saurabh Singh

Related questions

Recent Activity

Donate For Us