Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get list of data types from schema in Apache Spark

Tags:

I have the following code in Spark-Python to get the list of names from the schema of a DataFrame, which works fine, but how can I get the list of the data types?

columnNames = df.schema.names 

For example, something like:

columnTypes = df.schema.types 

Is there any way to get a separate list of the data types contained in a DataFrame schema?

like image 496
User2130 Avatar asked May 19 '16 22:05

User2130


People also ask

How do I get the DataType of a column in Spark?

In Spark you can get all DataFrame column names and types (DataType) by using df. dttypes and df. schema where df is an object of DataFrame. Let's see some examples of how to get data type and column name of all columns and data type of selected column by name using Scala examples.

How do you check data types in PySpark?

You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].

How do I display a schema in Spark?

To get the schema of the Spark DataFrame, use printSchema() on Spark DataFrame object. From the above example, printSchema() prints the schema to console( stdout ) and show() displays the content of the Spark DataFrame.


2 Answers

Here's a suggestion:

df = sqlContext.createDataFrame([('a', 1)])  types = [f.dataType for f in df.schema.fields]  types > [StringType, LongType] 

Reference:

  • pyspark.sql.types.StructType
  • pyspark.sql.types.StructField
like image 124
Daniel de Paula Avatar answered Sep 24 '22 19:09

Daniel de Paula


Since the question title is not python-specific, I'll add scala version here:

val types = df.schema.fields.map(f => f.dataType) 

It will result in an array of org.apache.spark.sql.types.DataType.

like image 37
Viacheslav Shalamov Avatar answered Sep 23 '22 19:09

Viacheslav Shalamov