I have a list with more than 30 strings. how to convert list into dataframe . what i tried:
eg
Val list=List("a","b","v","b").toDS().toDF()
Output :
+-------+
| value|
+-------+
|a |
|b |
|v |
|b |
+-------+
Expected Output is
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
| a| b| v| a|
+---+---+---+---+
any help on this .
To do this first create a list of data and a list of column names. Then pass this zipped data to spark. createDataFrame() method. This method is used to create DataFrame.
toDF() toDF() method provides a very concise way to create a Dataframe. This method can be applied to a sequence of objects. To access the toDF() method, we have to import spark. implicits.
List("a","b","c","d")
represents a record with one field and so the resultset displays one element in each row.
To get the expected output, the row should have four fields/elements in it. So, we wrap around the list as List(("a","b","c","d"))
which represents one row, with four fields.
In a similar fashion a list with two rows goes as List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))
scala> val list = sc.parallelize(List(("a", "b", "c", "d"))).toDF()
list: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: string, _4: string]
scala> list.show
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
scala> val list = sc.parallelize(List(("a1","b1","c1","d1"),("a2","b2","c2","d2"))).toDF
list: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: string, _4: string]
scala> list.show
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
| a1| b1| c1| d1|
| a2| b2| c2| d2|
+---+---+---+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With