Creating a Spark DataFrame from a single string

Tags:

I'm trying to take a hardcoded String and turn it into a 1-row Spark DataFrame (with a single column of type StringType) such that:

String fizz = "buzz"

Would result with a DataFrame whose .show() method looks like:

+-----+
| fizz|
+-----+
| buzz|
+-----+

My best attempt thus far has been:

val rawData = List("fizz")
val df = sqlContext.sparkContext.parallelize(Seq(rawData)).toDF()

df.show()

But I get the following compiler error:

java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:413)
    at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)

Any ideas as to where I'm going awry? Also, how do I set "buzz" as the row value for the fizz column?

Update:

Trying:

sqlContext.sparkContext.parallelize(rawData).toDF()

I get a DF that looks like:

+----+
|  _1|
+----+
|buzz|
+----+

446

asked Oct 10 '16 17:10

smeeb

1 Answers

Try:

sqlContext.sparkContext.parallelize(rawData).toDF()

In 2.0 you can:

import spark.implicits._

rawData.toDF

Optionally provide a sequence of names for toDF:

sqlContext.sparkContext.parallelize(rawData).toDF("fizz")

answered Oct 13 '22 02:10

2 revsuser6022341

Related questions
                            
                                Using groupBy in Spark and getting back to a DataFrame
                            
                                Add Yarn cluster configuration to Spark application
                            
                                Can not always reuse Actor's name after graceful stop
                            
                                Using the java.time API in scala
                            
                                How to read parquet files using `ssc.fileStream()`? What are the types passed to `ssc.fileStream()`?
                            
                                ScalaCheck: choose an integer with custom probability distribution
                            
                                What does Some(string.!!) mean in Scala?
                            
                                Controlling false IntelliJ code editor error in Scala plugin
                            
                                Replace new line (\n) character in csv file - spark scala
                            
                                spray-json can't find JsonReader for type List[T]
                            
                                How to get date and time from string?
                            
                                Scala overriding def with val throws NPE
                            
                                How to implement the lifecycle callbacks of play framework(2.5.x)
                            
                                Split function difference between char and string arguments
                            
                                Why does `Future#toString` returns `"List()"`?
                            
                                How to return full row using Slick's insertOrUpdate
                            
                                Scala: How to get a range of rows in a dataframe
                            
                                Get ID after insert with ReactiveMongo
                            
                                Scala Future/Promise fast-fail pipeline
                            
                                How fast is pattern matching in Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a Spark DataFrame from a single string

Tags:

scala

apache-spark

spark-dataframe

Update:

smeeb

People also ask

1 Answers

2 revsuser6022341

Recent Activity

Donate For Us