I have a list with more than 30 strings. how to convert list into dataframe . what i tried: eg <pre class="prettyprint"><code>Val list=List("a","b","v","b").toDS().toDF() Output : +-------+ | value| +-------+ |a | |b | |v | |b | +-------+ Expected Output is +---+---+---+---+ | _1| _2| _3| _4| +---+---+---+---+ | a| b| v| a| +---+---+---+---+ </code></pre> any help on this .

<code>List("a","b","c","d")</code> represents a record with one field and so the resultset displays one element in each row. To get the expected output, the row should have four fields/elements in it. So, we wrap around the list as <code>List(("a","b","c","d"))</code> which represents one row, with four fields. In a similar fashion a list with two rows goes as <code>List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))</code> <pre class="prettyprint"><code>scala> val list = sc.parallelize(List(("a", "b", "c", "d"))).toDF() list: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: string, _4: string] scala> list.show +---+---+---+---+ | _1| _2| _3| _4| +---+---+---+---+ | a| b| c| d| +---+---+---+---+ scala> val list = sc.parallelize(List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))).toDF list: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: string, _4: string] scala> list.show +---+---+---+---+ | _1| _2| _3| _4| +---+---+---+---+ | a1| b1| c1| d1| | a2| b2| c2| d2| +---+---+---+---+ </code></pre>

Convert List into dataframe spark scala

Tags:

scala

apache-spark

apache-spark-sql

spark-dataframe

I have a list with more than 30 strings. how to convert list into dataframe . what i tried:

Val list=List("a","b","v","b").toDS().toDF()

Output :


+-------+
|  value|
+-------+
|a      |
|b      |
|v      |
|b      |
+-------+


Expected Output is 


  +---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
|  a|  b|  v|  a|
+---+---+---+---+

any help on this .

394

asked Jan 26 '17 04:01

senthil kumar p

1 Answers

List("a","b","c","d") represents a record with one field and so the resultset displays one element in each row.

To get the expected output, the row should have four fields/elements in it. So, we wrap around the list as List(("a","b","c","d")) which represents one row, with four fields. In a similar fashion a list with two rows goes as List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))

scala> val list = sc.parallelize(List(("a", "b", "c", "d"))).toDF()
list: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: string, _4: string]

scala> list.show
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
|  a|  b|  c|  d|
+---+---+---+---+


scala> val list = sc.parallelize(List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))).toDF
list: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: string, _4: string]

scala> list.show
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
| a1| b1| c1| d1|
| a2| b2| c2| d2|
+---+---+---+---+

108

answered Oct 24 '22 09:10

SrinR

Related questions
                            
                                How to use mocks with the Cake Pattern
                            
                                How can I test Java programs with ScalaCheck?
                            
                                Kryo serialization refuses to register class
                            
                                Handling Faults in Akka actors
                            
                                scala:console is worse than Scala's own REPL?
                            
                                Efficient nearest neighbour search in Scala
                            
                                Akka testing supervisor error handling
                            
                                Why are scaladoc method signatures wrong?
                            
                                Why can't _ be used to indicate an unused/ignored argument in a method override?
                            
                                Travis CI ignoring MAVEN_OPTS?
                            
                                Spark JSON text field to RDD
                            
                                scala : it is impossible to put a tuple as a function's argument
                            
                                Spark: scala.MatchError (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
                            
                                Function implicit parameters not any more so after passing it to a higher order function
                            
                                Shading over third party classes
                            
                                Does a flatMap in spark cause a shuffle?
                            
                                Play Scala Dependency injection: How to use it
                            
                                How to use Spark's repartitionAndSortWithinPartitions?
                            
                                How to read in-memory JSON string into Spark DataFrame
                            
                                Scala Compilation Error : Value += is not member of Int

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With