Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a DataFrame schema to file in Scala

I have a DataFrame that loads from a huge json file and gets the schema from it. The schema is basically around 1000 columns. I want the same output of printSchema to be saved in a file instead of the console.

Any ideas?

like image 480
Sarah Avatar asked Jul 01 '16 05:07

Sarah


People also ask

How do I create a schema for a DataFrame in spark?

We can create a DataFrame programmatically using the following three steps. Create an RDD of Rows from an Original RDD. Create the schema represented by a StructType matching the structure of Rows in the RDD created in Step 1. Apply the schema to the RDD of Rows via createDataFrame method provided by SQLContext.

How do I print a schema of a DataFrame in spark?

To get the schema of the Spark DataFrame, use printSchema() on Spark DataFrame object. From the above example, printSchema() prints the schema to console( stdout ) and show() displays the content of the Spark DataFrame.


2 Answers

You can do the following if you are working in a local environment :

val filePath = "/path/to/file/schema_file"
new PrintWriter(filePath) { write(df.schema.treeString); close }

If you are on HDFS, you'll need to provide a URI.

like image 158
eliasah Avatar answered Sep 28 '22 23:09

eliasah


This is the body of printSchema():

 /**
   * Prints the schema to the console in a nice tree format.
   * @group basic
   * @since 1.3.0
   */
  // scalastyle:off println
  def printSchema(): Unit = println(schema.treeString)
  // scalastyle:on println

So you can't do much, but I have a work around that can work in your case. Set the out stream to a file Stream so that it gets printed to your File.

Something like this

 val out = new PrintStream(new FileOutputStream("output.txt"));
System.setOut(out);

I hope I solved your query !

like image 23
Shivansh Avatar answered Sep 28 '22 23:09

Shivansh