I'm having an error when trying to cast a StringType to a IntType on a pyspark dataframe:
joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
.select(aggregates.year,'Production')\
.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
.drop("Production")\
.withColumnRenamed("ProductionTmp", "Production")
I'm getting:
TypeErrorTraceback (most recent call last) in () 1 joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self, dataType) 335 jc = self._jc.cast(jdt) 336 else: --> 337 raise TypeError("unexpected type: %s" % type(dataType)) 338 return Column(jc) 339
TypeError: unexpected type:
To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.
You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].
StringType [source] String data type. fromInternal (obj) Converts an internal SQL object into a native Python object.
In order to convert array to a string, PySpark SQL provides a built-in function concat_ws() which takes delimiter of your choice as a first argument and array column (type Column) as the second argument. In order to use concat_ws() function, you need to import it using pyspark. sql. functions.
PySpark SQL data types are no longer (it was the case before 1.3) singletons. You have to create an instance:
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col
col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>
In contrast to:
col("foo").cast(IntegerType)
TypeError
...
TypeError: unexpected type: <class 'type'>
cast
method can be also used with string descriptions:
col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>
For an overview of the supported Data Types in Spark SQL and Dataframes, one can click this link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With