Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark StructType is not defined

Tags:

I'm trying to struct a schema for db testing, and StructType apparently isn't working for some reason. I'm following a tut, and it doesn't import any extra module.

<type 'exceptions.NameError'>, NameError("name 'StructType' is not defined",), <traceback object at 0x2b555f0>) 

I'm on spark 1.4.0, and Ubuntu 12 if that has anything to do with the problem. How would I fix this problem? Thank you in advance.

like image 621
Joseph Seung Jae Dollar Avatar asked Jun 18 '15 02:06

Joseph Seung Jae Dollar


People also ask

What is StructType in PySpark?

The StructType in PySpark is defined as the collection of the StructField's that further defines the column name, column data type, and boolean to specify if field and metadata can be nullable or not. The StructField in PySpark represents the field in the StructType.

What is struct field in PySpark?

StructField() 2. StructField() is used to add columns to the dataframe, which takes column names as the first parameter and the datatype of the particular columns as the second parameter. We have to use the data types from the methods which are imported from the pyspark. sql. types module.

How do I select a struct in PySpark?

If you have a struct (StructType) column on PySpark DataFrame, you need to use an explicit column qualifier in order to select the nested struct columns.

How does PySpark define array type?

Create PySpark ArrayType You can create an instance of an ArrayType using ArraType() class, This takes arguments valueType and one optional argument valueContainsNull to specify if a value can accept null, by default it takes True. valueType should be a PySpark type that extends DataType class.


2 Answers

Did you import StructType? If not

from pyspark.sql.types import StructType 

should solve the problem.

like image 64
zero323 Avatar answered Oct 07 '22 00:10

zero323


from pyspark.sql.types import StructType 

That would fix it but next you might get NameError: name 'IntegerType' is not defined or NameError: name 'StringType' is not defined ..

To avoid all of that just do:

from pyspark.sql.types import * 

Alternatively import all the types you require one by one:

from pyspark.sql.types import StructType, IntegerType, StringType 

All Types: Apache Spark Data Types

like image 44
Ani Menon Avatar answered Oct 07 '22 02:10

Ani Menon