Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read Parquet file using Spark Core API?

How to read Parquet file using Spark Core API?

I know using Spark SQL has some methods to read parquet file. But we cannot use Spark SQL for our projects.

Do we have to use newAPIHadoopFile method on JavaSparkContext to do this?

I am using Java to implement Spark Job.

like image 451
Shankar Avatar asked Sep 02 '15 10:09

Shankar


People also ask

How do I read a Parquet file in spark?

Spark Read Parquet file into DataFrame Similar to write, DataFrameReader provides parquet () function (spark.read.parquet) to read the parquet files and creates a Spark DataFrame. In this example snippet, we are reading data from an apache parquet file we have written before. val parqDF = spark. read. parquet ("/tmp/output/people.parquet")

What is Spark SQL parquet?

Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data. When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons.

How does the parquet data source work?

When true, the Parquet data source merges schemas collected from all data files, otherwise the schema is picked from the summary file or a random data file if no summary file is available. If true, data will be written in a way of Spark 1.4 and earlier.

How to read parquet files in Spark using synapse studio?

Once you create a parquet file, you can read its content using DataFrame.read.parquet () function: The result of this query can be executed in Synapse Studio notebook. Apache Spark enables you to access your parquet files using table API. You can create external table on a set of parquet files using the following code:


1 Answers

Use the below code:

SparkSession spark = SparkSession.builder().master("yarn").appName("Application").enableHiveSupport().getOrCreate();
Dataset<Row> ds = spark.read().parquet(filename);
like image 76
developer.raj Avatar answered Sep 22 '22 16:09

developer.raj