Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Databricks query performance when filtering on a column correlated to the partition-column

Setting: Delta-lake, Databricks SQL compute used by powerbi. I am wondering about the following scenario: We have a column timestamp and a derived column date (which is the date of timestamp), and we choose to partitionby date. When we query we use timestamp in the filter, not date.

My understanding is that databrikcs a priori wont connect the timestamp and the date, and seemingly wont get any advantage of the partitioning. But since the files are in fact partitioned by timestamps (implicitly), when databricks looks at the min/max timestamps of all the files, it will find that it can skip most files after all. So it seems like we can get quite a benefit of partitioning even if its on a column we dont explicitly use in the query.

  1. Is this correct?
  2. What is the performance cost (roughly) of having to filter away files in this way vs using the partitioning directly.
  3. Will databricks have all the min/max information in memory, or does it have to go out and look at the files for each query?
like image 575
epa095 Avatar asked Oct 24 '25 04:10

epa095


1 Answers

Yes, Databricks will take implicit advantage of this partitioning through data skipping because there will be min/max statistics associated with specific data files. The min/max information will be loaded into memory from the transaction log, but it will need to make decision which files it need to hit on every query. But because everything is in memory, it shouldn't be very big performance overhead, until you have hundreds of thousands files.

One thing that you may consider - use generated column instead of explicit date column. Declare it as date GENERATED ALWAYS AS (CAST(timestampColumn AS DATE)), and partition by it. The advantage is that when you're doing a query on timestampColumn, then it should do partition filtering on the date column automatically.

like image 135
Alex Ott Avatar answered Oct 25 '25 22:10

Alex Ott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!