Pyspark error on creating dataframe: 'StructField' object has no attribute 'encode'

Tags:

pyspark

I'm facing a little issue when creating a dataframe:

from pyspark.sql import SparkSession, types

spark = SparkSession.builder.appName('test').getOrCreate()

df_test = spark.createDataFrame(
    ['a string', 1],
    schema = [
        types.StructField('col1', types.StringType(), True),
        types.StructField('col2', types.IntegerType(), True)
    ]
)

## AttributeError: 'StructField' object has no attribute 'encode'

I don't see anything wrong with my code (it's so simple I feel really dumb). But I can't get this to work. Can you point me in the right direction?

412

asked Apr 23 '19 15:04

1 Answers

You were most of the way there!

When you call createDataFrame specifying a schema, the schema needs to be a StructType. An ordinary list isn't enough.

Create an RDD of tuples or lists from the original RDD;

Create the schema represented by a StructType matching the structure of tuples or lists in the RDD created in the step 1.

Apply the schema to the RDD via createDataFrame method provided by SparkSession.

Also, the first field in createDataFrame is a list of rows, not a list of values for one row. So a single one-dimensional list will cause errors. Wrapping it in a dict that explicitly identifies which columns hold which values is one solution, but there might be others.

The result should look something like:

df_test = spark.createDataFrame(
    [{'col1': 'a string', 'col2': 1}],
    schema = types.StructType([
        types.StructField('col1', types.StringType(), True),
        types.StructField('col2', types.IntegerType(), True)
    ])
)

answered Oct 02 '22 21:10

Jesse Amano

Related questions
                            
                                How to alias all methods from an object?
                            
                                Creating an ASCII art world map
                            
                                Regex to match and replace string with multiple lines Python
                            
                                How to fix 'parse error on (VAR_SIGN)' in a graphql query in python
                            
                                Keras - How to get time taken by each layer in training?
                            
                                Python NetworkX — set node color automatically based on a list of values
                            
                                Conditional element classes with jinja, I want a div to get a class if a list item contains a certain item
                            
                                matplotlib hist function argmument density not working
                            
                                Call a coroutine without yielding the event loop
                            
                                How to search and play a video on YouTube using Selenium in Python?
                            
                                How to resample a column by id
                            
                                Import failure of s3fs library in AWS Glue
                            
                                Pandas: Filling data for missing dates
                            
                                Numpy tobytes() with defined byteorder
                            
                                calling a function with delay
                            
                                What's the fastest way to copy values from one tensor to another in PyTorch?
                            
                                Pandas groupby for multiple values in a column
                            
                                Skip directory name in import path by importing subpackage in __init__.py
                            
                                Numpy array with different standard deviation per row
                            
                                Sum of diagonal elements in a matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark error on creating dataframe: 'StructField' object has no attribute 'encode'

Tags:

python

pyspark

Barranka

People also ask

1 Answers

Jesse Amano

Recent Activity

Donate For Us