Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to create dataframe with two columns [Seq(), String] - Spark

When I run the following on the spark-shell, I get a dataframe:

scala> val df = Seq(Array(1,2)).toDF("a")

scala> df.show(false)
+------+
|a     |
+------+
|[1, 2]|
+------+

But when I run the following to create a dataframe with two columns:

scala> val df1 = Seq(Seq(Array(1,2)),"jf").toDF("a","b")
<console>:23: error: value toDF is not a member of Seq[Object]
    val df1 = Seq(Seq(Array(1,2)),"jf").toDF("a","b")

I get the error:

Value toDF is not a member of Seq[Object].

How do I go about this? Is toDF only supported for sequences with primitive datatypes?

like image 756
Shibani Avatar asked Apr 26 '26 15:04

Shibani


1 Answers

You need a Seq of Tuple for the toDF method to work:

val df1 = Seq((Array(1,2),"jf")).toDF("a","b")
// df1: org.apache.spark.sql.DataFrame = [a: array<int>, b: string]

df1.show
+------+---+
|     a|  b|
+------+---+
|[1, 2]| jf|
+------+---+

Add more tuples for more rows:

val df1 = Seq((Array(1,2),"jf"), (Array(2), "ab")).toDF("a","b")
// df1: org.apache.spark.sql.DataFrame = [a: array<int>, b: string]

df1.show
+------+---+
|     a|  b|
+------+---+
|[1, 2]| jf|
|   [2]| ab|
+------+---+
like image 160
Psidom Avatar answered Apr 29 '26 12:04

Psidom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!