Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StructType can not accept object?

Tags:

pyspark

How do I resolve this issue?

rdd.collect()  //['3e866d48b59e8ac8aece79597df9fb4c'...]

rdd.toDF()    //Can not infer schema for type: <type 'str'>

myschema=StructType([StructField("col1", StringType(),True)])
rdd.toDF(myschema).show()

// StructType can not accept object "3e866d48b59e8ac8aece79597df9fb4c" in type

like image 856
Bala Avatar asked Dec 11 '22 08:12

Bala


1 Answers

It seems you have:

rdd = sc.parallelize(['3e866d48b59e8ac8aece79597df9fb4c'])

Which is a one dimensional data structure, a data frame is 2d; map each number to a tuple solves the problem:

rdd.map(lambda x: (x,)).toDF().show()
+--------------------+
|                  _1|
+--------------------+
|3e866d48b59e8ac8a...|
+--------------------+
like image 89
Psidom Avatar answered Jan 03 '23 05:01

Psidom