Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe

I'm having an error when trying to cast a StringType to a IntType on a pyspark dataframe:

joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
    .select(aggregates.year,'Production')\
    .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
    .drop("Production")\
    .withColumnRenamed("ProductionTmp", "Production")

I'm getting:

TypeErrorTraceback (most recent call last) in () 1 joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")

/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self, dataType) 335 jc = self._jc.cast(jdt) 336 else: --> 337 raise TypeError("unexpected type: %s" % type(dataType)) 338 return Column(jc) 339

TypeError: unexpected type:

like image 985
Romeo Kienzler Avatar asked Nov 20 '16 05:11

Romeo Kienzler


People also ask

How do I change the DataType of a Column in spark data frame?

To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.

How do you check variable type in PySpark?

You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].

What is StringType() in PySpark?

StringType [source] String data type. fromInternal (obj) Converts an internal SQL object into a native Python object.

How do you convert a Column to a String in PySpark?

In order to convert array to a string, PySpark SQL provides a built-in function concat_ws() which takes delimiter of your choice as a first argument and array column (type Column) as the second argument. In order to use concat_ws() function, you need to import it using pyspark. sql. functions.


1 Answers

PySpark SQL data types are no longer (it was the case before 1.3) singletons. You have to create an instance:

from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col

col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>

In contrast to:

col("foo").cast(IntegerType)
TypeError  
   ...
TypeError: unexpected type: <class 'type'>

cast method can be also used with string descriptions:

col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>

For an overview of the supported Data Types in Spark SQL and Dataframes, one can click this link.

like image 196
zero323 Avatar answered Sep 17 '22 17:09

zero323