I have the following code in Spark-Python to get the list of names from the schema of a DataFrame, which works fine, but how can I get the list of the data types?
columnNames = df.schema.names
For example, something like:
columnTypes = df.schema.types
Is there any way to get a separate list of the data types contained in a DataFrame schema?
In Spark you can get all DataFrame column names and types (DataType) by using df. dttypes and df. schema where df is an object of DataFrame. Let's see some examples of how to get data type and column name of all columns and data type of selected column by name using Scala examples.
You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].
To get the schema of the Spark DataFrame, use printSchema() on Spark DataFrame object. From the above example, printSchema() prints the schema to console( stdout ) and show() displays the content of the Spark DataFrame.
Here's a suggestion:
df = sqlContext.createDataFrame([('a', 1)]) types = [f.dataType for f in df.schema.fields] types > [StringType, LongType]
Reference:
Since the question title is not python-specific, I'll add scala
version here:
val types = df.schema.fields.map(f => f.dataType)
It will result in an array of org.apache.spark.sql.types.DataType
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With