I have a DataFrame that loads from a huge json file and gets the schema from it. The schema is basically around 1000 columns. I want the same output of printSchema to be saved in a file instead of the console.
Any ideas?
We can create a DataFrame programmatically using the following three steps. Create an RDD of Rows from an Original RDD. Create the schema represented by a StructType matching the structure of Rows in the RDD created in Step 1. Apply the schema to the RDD of Rows via createDataFrame method provided by SQLContext.
To get the schema of the Spark DataFrame, use printSchema() on Spark DataFrame object. From the above example, printSchema() prints the schema to console( stdout ) and show() displays the content of the Spark DataFrame.
You can do the following if you are working in a local environment :
val filePath = "/path/to/file/schema_file"
new PrintWriter(filePath) { write(df.schema.treeString); close }
If you are on HDFS, you'll need to provide a URI.
This is the body of printSchema():
/**
* Prints the schema to the console in a nice tree format.
* @group basic
* @since 1.3.0
*/
// scalastyle:off println
def printSchema(): Unit = println(schema.treeString)
// scalastyle:on println
So you can't do much, but I have a work around that can work in your case. Set the out stream to a file Stream so that it gets printed to your File.
Something like this
val out = new PrintStream(new FileOutputStream("output.txt"));
System.setOut(out);
I hope I solved your query !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With