Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save a result of printSchema() function to variable in Pyspark?

I'm using the printSchema function to infer schema of Json file. I want to save the result of this function call in a variable to parse it line by line so that I can extract a structure of a schema and convert it in a DDL schema for creating a table in hive.

How can this be done?

like image 456
LilyAZ Avatar asked Jan 26 '26 04:01

LilyAZ


1 Answers

If you inspect the source code for printSchema(), you will see that this function just does the following:

print(self._jdf.schema().treeString())

Therefore, you can save the output as follows:

printSchemaString = df._jdf.schema().treeString()

Other references:

  • Saving result of DataFrame show() to string in pyspark
  • Capturing the result of explain() in pyspark
like image 56
pault Avatar answered Jan 28 '26 01:01

pault



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!