Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print out types of data frame columns in Spark

Tags:

pyspark

I tried using VectorAssembler on my Spark Data Frame and it complained that it didn't support the StringType type. My Data Frame has 2126 columns.

What's the programmatic way to print out all the column types?

like image 563
Huey Avatar asked Sep 24 '16 01:09

Huey


People also ask

How do I get the DataType of a column in Spark?

You can find all column names & data types (DataType) of PySpark DataFrame by using df. dtypes and df. schema and you can also retrieve the data type of a specific column name using df. schema["name"].

How do I list all columns of a DataFrame in Spark?

Example 1 – Spark Convert DataFrame Column to List. In order to convert Spark DataFrame Column to List, first select() the column you want, next use the Spark map() transformation to convert the Row to String, finally collect() the data to the driver which returns an Array[String] .

How do I print the schema of a DataFrame in PySpark?

DataFrame. printSchema() is used to print or display the schema of the DataFrame in the tree format along with column name and data type. If you have DataFrame with a nested structure it displays schema in a nested tree format.

How do you display columns in PySpark?

Example 3: Showing Full column content of PySpark Dataframe using show() function. In the code for showing the full column content we are using show() function by passing parameter df. count(),truncate=False, we can write as df. show(df.


2 Answers

df.printSchema() will print you the dataframe schema in an easy to follow formatting

like image 82
RodiX Avatar answered Oct 14 '22 03:10

RodiX


Try:

>>> for name, dtype in df.dtypes:
...     print(name, dtype)

or

>>> df.schema
like image 41
user6022341 Avatar answered Oct 14 '22 02:10

user6022341