Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create DataFrame from Scala's List of Iterables?

I have the following Scala value:

val values: List[Iterable[Any]] = Traces().evaluate(features).toList 

and I want to convert it to a DataFrame.

When I try the following:

sqlContext.createDataFrame(values) 

I got this error:

error: overloaded method value createDataFrame with alternatives:  [A <: Product](data: Seq[A])(implicit evidence$2: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame cannot be applied to (List[Iterable[Any]])           sqlContext.createDataFrame(values) 

Why?

like image 520
MTT Avatar asked Jun 27 '16 21:06

MTT


People also ask

How do you create a DataFrame in Pyspark from a list?

To do this first create a list of data and a list of column names. Then pass this zipped data to spark. createDataFrame() method. This method is used to create DataFrame.

How do you convert RDD to DF?

Converting Spark RDD to DataFrame can be done using toDF(), createDataFrame() and transforming rdd[Row] to the data frame.

What is import spark Implicits _?

From Apache spark source code, implicits is an object class inside SparkSession class. The implicits class has extended the SQLImplicits like this : object implicits extends org. apache.


1 Answers

Thats what spark implicits object is for. It allows you to convert your common scala collection types into DataFrame / DataSet / RDD. Here is an example with Spark 2.0 but it exists in older versions too

import org.apache.spark.sql.SparkSession val values = List(1,2,3,4,5)  val spark = SparkSession.builder().master("local").getOrCreate() import spark.implicits._ val df = values.toDF() 

Edit: Just realised you were after 2d list. Here is something I tried on spark-shell. I converted a 2d List to List of Tuples and used implicit conversion to DataFrame:

val values = List(List("1", "One") ,List("2", "Two") ,List("3", "Three"),List("4","4")).map(x =>(x(0), x(1))) import spark.implicits._ val df = values.toDF 

Edit2: The original question by MTT was How to create spark dataframe from a scala list for a 2d list for which this is a correct answer. The original question is https://stackoverflow.com/revisions/38063195/1 The question was later changed to match an accepted answer. Adding this edit so that if someone else looking for something similar to the original question can find it.

like image 106
sparker Avatar answered Sep 30 '22 09:09

sparker