I'm trying to struct a schema for db testing, and StructType apparently isn't working for some reason. I'm following a tut, and it doesn't import any extra module.
<type 'exceptions.NameError'>, NameError("name 'StructType' is not defined",), <traceback object at 0x2b555f0>)
I'm on spark 1.4.0, and Ubuntu 12 if that has anything to do with the problem. How would I fix this problem? Thank you in advance.
The StructType in PySpark is defined as the collection of the StructField's that further defines the column name, column data type, and boolean to specify if field and metadata can be nullable or not. The StructField in PySpark represents the field in the StructType.
StructField() 2. StructField() is used to add columns to the dataframe, which takes column names as the first parameter and the datatype of the particular columns as the second parameter. We have to use the data types from the methods which are imported from the pyspark. sql. types module.
If you have a struct (StructType) column on PySpark DataFrame, you need to use an explicit column qualifier in order to select the nested struct columns.
Create PySpark ArrayType You can create an instance of an ArrayType using ArraType() class, This takes arguments valueType and one optional argument valueContainsNull to specify if a value can accept null, by default it takes True. valueType should be a PySpark type that extends DataType class.
Did you import StructType
? If not
from pyspark.sql.types import StructType
should solve the problem.
from pyspark.sql.types import StructType
That would fix it but next you might get NameError: name 'IntegerType' is not defined
or NameError: name 'StringType' is not defined
..
To avoid all of that just do:
from pyspark.sql.types import *
Alternatively import all the types you require one by one:
from pyspark.sql.types import StructType, IntegerType, StringType
All Types: Apache Spark Data Types
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With