Spark version : 2.1
For example, in pyspark, i create a list
test_list = [['Hello', 'world'], ['I', 'am', 'fine']]
then how to create a dataframe form the test_list, where the dataframe's type is like below:
DataFrame[words: array<string>]
In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
To do this first create a list of data and a list of column names. Then pass this zipped data to spark. createDataFrame() method. This method is used to create DataFrame.
You can manually create a PySpark DataFrame using toDF() and createDataFrame() methods, both these function takes different signatures in order to create DataFrame from existing RDD, list, and DataFrame.
here is how -
from pyspark.sql.types import * cSchema = StructType([StructField("WordList", ArrayType(StringType()))]) # notice extra square brackets around each element of list test_list = [['Hello', 'world']], [['I', 'am', 'fine']] df = spark.createDataFrame(test_list,schema=cSchema)
i had to work with multiple columns and types - the example below has one string column and one integer column. A slight adjustment to Pushkr's code (above) gives:
from pyspark.sql.types import * cSchema = StructType([StructField("Words", StringType())\ ,StructField("total", IntegerType())]) test_list = [['Hello', 1], ['I am fine', 3]] df = spark.createDataFrame(test_list,schema=cSchema)
output:
df.show() +---------+-----+ | Words|total| +---------+-----+ | Hello| 1| |I am fine| 3| +---------+-----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With