Pyspark convert a standard list to data frame [duplicate]

Tags:

The case is really simple, I need to convert a python list into data frame with following code

from pyspark.sql.types import StructType from pyspark.sql.types import StructField from pyspark.sql.types import StringType, IntegerType  schema = StructType([StructField("value", IntegerType(), True)]) my_list = [1, 2, 3, 4] rdd = sc.parallelize(my_list) df = sqlContext.createDataFrame(rdd, schema)  df.show()

it failed with following error:

Click to copy

    raise TypeError("StructType can not accept object %r in type %s" % (obj, type(obj))) TypeError: StructType can not accept object 1 in type <class 'int'>

973

asked Jan 25 '18 17:01

seiya

2 Answers

This solution is also an approach that uses less code, avoids serialization to RDD and is likely easier to understand:

Click to copy

from pyspark.sql.types import IntegerType  # notice the variable name (more below) mylist = [1, 2, 3, 4]  # notice the parens after the type name spark.createDataFrame(mylist, IntegerType()).show()

NOTE: About naming your variable list: the term list is a Python builtin function and as such, it is strongly recommended that we avoid using builtin names as the name/label for our variables because we end up overwriting things like the list() function. When prototyping something fast and dirty, a number of folks use something like: mylist.

179

answered Oct 05 '22 11:10

E. Ducateme

Please see the below code:

Click to copy

    from pyspark.sql import Row     li=[1,2,3,4]     rdd1 = sc.parallelize(li)     row_rdd = rdd1.map(lambda x: Row(x))     df=sqlContext.createDataFrame(row_rdd,['numbers']).show()

Click to copy

+-------+ |numbers| +-------+ |      1| |      2| |      3| |      4| +-------+

answered Oct 05 '22 11:10

user15051990

Related questions
                            
                                PyQt5 failing import of QtGui
                            
                                Usefulness of def __init__(self)?
                            
                                Difference between "data" and "params" in Python requests?
                            
                                Import class from module dynamically
                            
                                importing a module in nested packages
                            
                                Simulate Mouse Clicks on Python
                            
                                Difference between setattr and object manipulation in python/django
                            
                                Adding extra data to Django Rest Framework results for entire result set
                            
                                Set up virtualenv using a requirements.txt generated by conda
                            
                                How to convert to a Python datetime object with JSON.loads?
                            
                                Is there a test suite for numpy / scipy?
                            
                                Fitting only one parameter of a function with many parameters in python
                            
                                Convert map object to numpy array in python 3
                            
                                etree Clone Node
                            
                                AUTH_USER_MODEL refers to model .. that has not been installed and created AbstractUser models not able to login
                            
                                error using pip search (pip search stopped working)
                            
                                Better/Faster to Loop through set or list?
                            
                                Django: Generic detail view must be called with either an object pk or a slug
                            
                                Seaborn: countplot() with frequencies
                            
                                Multiple outputs in Keras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark convert a standard list to data frame [duplicate]

Tags:

python

apache-spark

pyspark

pyspark-sql

seiya

People also ask

2 Answers

E. Ducateme

user15051990

Recent Activity

Donate For Us