filter pushdown using spark-sql on map type column in parquet

Tags:

I am trying to store my data in nested way in parquet and using map type column to store complex objects as values.

If somebody could let me know whether filter push down works on map type of columns or not.For example below is my sql query -

`select measureMap['CR01'].tenorMap['1M'] from RiskFactor where businessDate='2016-03-14' and bookId='FI-UK'`

measureMap is a map with key as String and value as a custom data type containing 2 attributes - String and another map of String,Double pair.

I want to know whether pushdown will work on map or not i.e if map has 10 key value pairs , Spark will bring whole map's data in memort and create the object model or it will filter out the data depending upon the key at I/O read level.

Also I want ot know is there is any way to specify key in where clause, something like - where measureMap.key = 'CR01' ?

767

asked Jun 21 '16 08:06

Vijayendra Bhati

1 Answers

The short answer is No. Parquet predicate pushdown doesn't work with mapType columns or for the nested parquet structure.
Spark catalyst optimizer only understands the top level column in the parquet data. It uses the column type, column data range, encoding etc to finally generate the whole stage code for the query.
When the data is in a MapType format it is not possible to get this information from the column. You could have hundreds of key-value pair inside a map which is impossible with current spark infrastructure to do a predicate pushdown.

187

answered Sep 27 '22 23:09

Avishek Bhattacharya

Related questions
                            
                                Searching the Mac OSX system dictionaries?
                            
                                Dictionary vs Hashtable memory usage
                            
                                Is there an open source immutable dictionary for C#, with fast 'With/Without' methods?
                            
                                Check words definitions using iBooks App Dictionary
                            
                                How to write a list with a nested dictionary to a csv file?
                            
                                Seek through massive data grouped with multiple keys C#
                            
                                Any performance benefits to removing items from C# Dictionary after lookup if they only need to be read once
                            
                                Sass map produce variable names [duplicate]
                            
                                In plain C, how to do you make the equivalent of a "map"?
                            
                                How to display a track on a layer with lat and long
                            
                                Plotting a dictionary of DataFrames
                            
                                Correct usage of storing objects in maps

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

filter pushdown using spark-sql on map type column in parquet

Tags:

dictionary

predicate

apache-spark

parquet

Vijayendra Bhati

People also ask

1 Answers

Avishek Bhattacharya

Recent Activity

Donate For Us