Parquet predicate pushdown

1 Answers

Predicate pushdown deals with what values will be scanned and not what columns. So, if you apply filter on column A to only return records with value V, the predicate push down will make parquet read only blocks that may contain values V. Parquet holds min/max statistics in several levels, and it will compare the value V to the those min/max headers, and only scan blocks where min/max contains the value V. This is for predicate push down.

Another thing with parquet is "projection pushdown" - it stores data in columns, so when your projection limits the query to certain columns, only those columns will be returned. This feature is not what is called predicate pushdown though.

answered Oct 17 '22 09:10

roee

Related questions
                            
                                What is the difference between the hive jdbc client and the hive metastore java api?
                            
                                Running spark-submit with --master yarn-cluster: issue with spark-assembly
                            
                                Why there are many spark-warehouse folders got created?
                            
                                Getting started with MapReduce/Hadoop [closed]
                            
                                Error in starting hadoop Job Tracker
                            
                                Hadoop / MapReduce - Optimizing "Top N" Word Count MapReduce Job
                            
                                How to use hbase with Spring Boot using Java instead of XML?
                            
                                How to edit and relaunch a terminated cluster on Amazon EMR?
                            
                                Hadoop 2.0 Name Node, Secondary Node and Checkpoint node for High Availability
                            
                                Different ways of configuring the memory to the TaskTracker child process (Mapper and Reduce Tasks)
                            
                                Finding Connected Components using Hadoop/MapReduce
                            
                                Working of RecordReader in Hadoop
                            
                                Hadoop MapReduce: Possible to define two mappers and reducers in one hadoop job class?
                            
                                What is the usage of Configured class in Hadoop programs?
                            
                                Group by multiple fields and output tuple
                            
                                Get error when I run Hbase shell
                            
                                Write and read raw byte arrays in Spark - using Sequence File SequenceFile
                            
                                Accessing a file that is being written
                            
                                pom.xml for Hadoop 2.6.0
                            
                                Hadoop on Windows Building/ Installation Error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parquet predicate pushdown

Tags:

apache-spark

hadoop

parquet

bigdata

jbrown

People also ask

1 Answers

roee

Recent Activity

Donate For Us