I tried using VectorAssembler on my Spark Data Frame and it complained that it didn't support the StringType type. My Data Frame has 2126 columns.
What's the programmatic way to print out all the column types?
You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].
Example 1 – Spark Convert DataFrame Column to List. In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .
DataFrame. printSchema() is used to print or display the schema of the DataFrame in the tree format along with column name and data type. If you have DataFrame with a nested structure it displays schema in a nested tree format.
Example 3: Showing Full column content of PySpark Dataframe using show() function. In the code for showing the full column content we are using show() function by passing parameter df. count(),truncate=False, we can write as df. show(df.
df.printSchema()
will print you the dataframe schema in an easy to follow formatting
Try:
>>> for name, dtype in df.dtypes:
... print(name, dtype)
or
>>> df.schema
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With