The different DataType
s available for Spark SQL can be found here. Can anyone please tell me what would be the corresponding Java/Scala data type for each of Spark SQL's DataType
s?
Hive integration Run SQL or HiveQL queries on existing warehouses. Spark SQL supports the HiveQL syntax as well as Hive SerDes and UDFs, allowing you to access existing Hive warehouses. Spark SQL can use existing Hive metastores, SerDes, and UDFs.
Spark map() is a transformation operation that is used to apply the transformation on every element of RDD, DataFrame, and Dataset and finally returns a new RDD/Dataset respectively. In this article, you will learn the syntax and usage of the map() transformation with an RDD & DataFrame example.
Spark SQL allows relational queries expressed in SQL, HiveQL, or Scala to be executed using Spark. At the core of this component is a new type of RDD, SchemaRDD. SchemaRDDs are composed of Row objects, along with a schema that describes the data types of each column in the row.
I read that Spark SQL has three complex data types: ArrayType, MapType, and StructType.
For those trying to find the Java types, they're now also hosted at the link from zero323's answer. To document the current revision here:
Data type | Value type in Java | API to access or create a data type
-------------------------------------------------------------------------------------------
ByteType | byte or Byte | DataTypes.ByteType
ShortType | short or Short | DataTypes.ShortType
IntegerType | int or Integer | DataTypes.IntegerType
LongType | long or Long | DataTypes.LongType
FloatType | float or Float | DataTypes.FloatType
DoubleType | double or Double | DataTypes.DoubleType
DecimalType | java.math.BigDecimal | DataTypes.createDecimalType() or DataTypes.createDecimalType(precision, scale).
StringType | String | DataTypes.StringType
BinaryType | byte[] | DataTypes.BinaryType
BooleanType | boolean or Boolean | DataTypes.BooleanType
TimestampType | java.sql.Timestamp | DataTypes.TimestampType
DateType | java.sql.Date | DataTypes.DateType
ArrayType | java.util.List | DataTypes.createArrayType(elementType) or DataTypes.createArrayType(elementType, containsNull).
MapType | java.util.Map | DataTypes.createMapType(keyType, valueType) or DataTypes.createMapType(keyType, valueType, valueContainsNull)
StructType | org.apache.spark.sql.Row | DataTypes.createStructType(fields)
StructField | The value type in Java of the | DataTypes.createStructField(name, dataType, nullable)
| data type of this field (For |
| example, int for a StructField |
| with the data type IntegerType) |
One thing of note when working with StructTypes in particular - it appears that, if you wish to declare an empty StructType in another as a placeholder value, you must use a new StructType()
rather than the suggested DataTypes.createStructType((StructField)null)
to prevent null pointers. Remember to instantiate the nested StructType with StructFields prior to usage.
Directly from the Spark SQL and DataFrame Guide:
Data type | Value type in Scala
------------------------------------------------
ByteType | Byte
ShortType | Short
IntegerType | Int
LongType | Long
FloatType | Float
DoubleType | Double
DecimalType | java.math.BigDecimal
StringType | String
BinaryType | Array[Byte]
BooleanType | Boolean
TimestampType | java.sql.Timestamp
DateType | java.sql.Date
ArrayType | scala.collection.Seq
MapType | scala.collection.Map
StructType | org.apache.spark.sql.Row
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With