how to get the column names and their datatypes of parquet file using pyspark?

Question

i have a parquet file on my hadoop cluster ,i want to capture the column names and their datatypes and write it on a textfile.how to get the column names and their datatypes of parquet file using pyspark.

zero323 · Accepted Answer

You can simply read the file and use schema to access individual fields:

sqlContext.read.parquet(path_to_parquet_file).schema.fields

tranquilram · Answer

Use dataframe.printSchema() - Prints out the schema in the tree format.

df.printSchema() root |-- age: integer (nullable = true) |-- name: string (nullable = true)

You can redirect the output of your program and capture that in a text file.

how to get the column names and their datatypes of parquet file using pyspark?

Tags:

apache-spark

pyspark

Shubham Mishra

2 Answers

zero323

tranquilram

Recent Activity

Donate For Us

how to get the column names and their datatypes of parquet file using pyspark?

Tags:

apache-spark

pyspark

Shubham Mishra

2 Answers

zero323

tranquilram

Related questions

Recent Activity

Donate For Us