Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the Scala type mapping for all Spark SQL DataType

The different DataTypes available for Spark SQL can be found here. Can anyone please tell me what would be the corresponding Java/Scala data type for each of Spark SQL's DataTypes?

like image 322
aa8y Avatar asked Oct 02 '15 02:10

aa8y


People also ask

What type of SQL does Spark SQL use?

Hive integration Run SQL or HiveQL queries on existing warehouses. Spark SQL supports the HiveQL syntax as well as Hive SerDes and UDFs, allowing you to access existing Hive warehouses. Spark SQL can use existing Hive metastores, SerDes, and UDFs.

What is map in Spark Scala?

Spark map() is a transformation operation that is used to apply the transformation on every element of RDD, DataFrame, and Dataset and finally returns a new RDD/Dataset respectively. In this article, you will learn the syntax and usage of the map() transformation with an RDD & DataFrame example.

What is Spark SQL in Scala?

Spark SQL allows relational queries expressed in SQL, HiveQL, or Scala to be executed using Spark. At the core of this component is a new type of RDD, SchemaRDD. SchemaRDDs are composed of Row objects, along with a schema that describes the data types of each column in the row.

What are three complex data types you can work with using Spark SQL?

I read that Spark SQL has three complex data types: ArrayType, MapType, and StructType.


2 Answers

For those trying to find the Java types, they're now also hosted at the link from zero323's answer. To document the current revision here:

Data type     |    Value type in Java              |    API to access or create a data type
-------------------------------------------------------------------------------------------
ByteType      |    byte or Byte                    |    DataTypes.ByteType
ShortType     |    short or Short                  |    DataTypes.ShortType
IntegerType   |    int or Integer                  |    DataTypes.IntegerType
LongType      |    long or Long                    |    DataTypes.LongType
FloatType     |    float or Float                  |    DataTypes.FloatType
DoubleType    |    double or Double                |    DataTypes.DoubleType
DecimalType   |    java.math.BigDecimal            |    DataTypes.createDecimalType() or DataTypes.createDecimalType(precision, scale).
StringType    |    String                          |    DataTypes.StringType
BinaryType    |    byte[]                          |    DataTypes.BinaryType
BooleanType   |    boolean or Boolean              |    DataTypes.BooleanType
TimestampType |    java.sql.Timestamp              |    DataTypes.TimestampType
DateType      |    java.sql.Date                   |    DataTypes.DateType
ArrayType     |    java.util.List                  |    DataTypes.createArrayType(elementType) or DataTypes.createArrayType(elementType, containsNull).
MapType       |    java.util.Map                   |    DataTypes.createMapType(keyType, valueType) or DataTypes.createMapType(keyType, valueType, valueContainsNull)
StructType    |    org.apache.spark.sql.Row        |    DataTypes.createStructType(fields)
StructField   |    The value type in Java of the   |    DataTypes.createStructField(name, dataType, nullable)
              |    data type of this field (For    |
              |    example, int for a StructField  |
              |    with the data type IntegerType) |

One thing of note when working with StructTypes in particular - it appears that, if you wish to declare an empty StructType in another as a placeholder value, you must use a new StructType() rather than the suggested DataTypes.createStructType((StructField)null) to prevent null pointers. Remember to instantiate the nested StructType with StructFields prior to usage.

like image 178
bsplosion Avatar answered Oct 25 '22 22:10

bsplosion


Directly from the Spark SQL and DataFrame Guide:

Data type       |    Value type in Scala
------------------------------------------------
ByteType        |    Byte   
ShortType       |    Short  
IntegerType     |    Int    
LongType        |    Long   
FloatType       |    Float  
DoubleType      |    Double     
DecimalType     |    java.math.BigDecimal
StringType      |    String
BinaryType      |    Array[Byte]
BooleanType     |    Boolean 
TimestampType   |    java.sql.Timestamp
DateType        |    java.sql.Date
ArrayType       |    scala.collection.Seq   
MapType         |    scala.collection.Map   
StructType      |    org.apache.spark.sql.Row
like image 25
zero323 Avatar answered Oct 25 '22 20:10

zero323