Vertica and joins

Tags:

I'm adapting a web analysis tool to use Vertica as the DB. I'm having real problems optimizing joins. I tried creating pre-join projections for some of my queries, and while it did make the queries blazing fast, it slowed data loading into the fact table to a crawl.

A simple INSERT INTO ... SELECT * FROM which we use to load data into the fact table from a staging table goes from taking ~5 seconds to taking 20+ minutes.

Because of this I dropped all pre-join projections and tried using the Database Designer to design query specific projections but it's not enough. Even with those projections a simple join is taking ~14 seconds, something that takes ~1 second with a pre-join projection.

My question is this: Is it normal for a pre-join projection to slow data insertion this much and if not, what could be the culprit? If it is normal, then it's a show stopper for us and are there other techniques we could use to speed up the joins?

We're running Vertica on a 5 node cluster, each node having 2 x quad core CPU and 32 GB of memory. The tables in my example query have 188,843,085 and 25,712,878 rows respectively.

The EXPLAIN output looks like this:

EXPLAIN SELECT referer_via_.url as referralPageUrl, COUNT(DISTINCT sessio
n.id) as visits FROM owa_session as session JOIN owa_referer AS referer_vi
a_ ON session.referer_id = referer_via_.id WHERE session.yyyymmdd BETWEEN 
'20121123' AND '20121123' AND session.site_id = '49' GROUP BY referer_via_
.url  ORDER BY visits DESC LIMIT 250;

Access Path:
+-SELECT  LIMIT 250 [Cost: 1M, Rows: 250 (STALE STATISTICS)] (PATH ID: 0)
|  Output Only: 250 tuples
|  Execute on: Query Initiator
| +---> SORT [Cost: 1M, Rows: 1 (STALE STATISTICS)] (PATH ID: 1)
| |      Order: count(DISTINCT "session".id) DESC
| |      Output Only: 250 tuples
| |      Execute on: All Nodes
| | +---> GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 1M, Rows: 1 (STALE 
STATISTICS)] (PATH ID: 2)
| | |      Aggregates: count(DISTINCT "session".id)
| | |      Group By: referer_via_.url
| | |      Execute on: All Nodes
| | | +---> GROUPBY HASH (SORT OUTPUT) (RESEGMENT GROUPS) [Cost: 1M, Rows
: 1 (STALE STATISTICS)] (PATH ID: 3)
| | | |      Group By: referer_via_.url, "session".id
| | | |      Execute on: All Nodes
| | | | +---> JOIN HASH [Cost: 1M, Rows: 1 (STALE STATISTICS)] (PATH ID: 
4) Outer (RESEGMENT)
| | | | |      Join Cond: ("session".referer_id = referer_via_.id)
| | | | |      Execute on: All Nodes
| | | | | +-- Outer -> STORAGE ACCESS for session [Cost: 463, Rows: 1 (ST
ALE STATISTICS)] (PUSHED GROUPING) (PATH ID: 5)
| | | | | |      Projection: public.owa_session_projection
| | | | | |      Materialize: "session".id, "session".referer_id
| | | | | |      Filter: ("session".site_id = '49')
| | | | | |      Filter: (("session".yyyymmdd >= 20121123) AND ("session"
.yyyymmdd <= 20121123))
| | | | | |      Execute on: All Nodes
| | | | | +-- Inner -> STORAGE ACCESS for referer_via_ [Cost: 293K, Rows:
26M] (PATH ID: 6)
| | | | | |      Projection: public.owa_referer_DBD_1_seg_Potency_2012112
2_Potency_20121122
| | | | | |      Materialize: referer_via_.id, referer_via_.url
| | | | | |      Execute on: All Nodes

752

asked Nov 23 '12 14:11

user997904

1 Answers

To speedup join:

Design session table as being partitioned on column "yyyymmdd". This will enable partition pruning
Add condition on column "yyyymmdd" to _referer_via_ and partition on it, if it is possible (most likely not)
have column site_id as possible close to the beginning of order by list in used (super)projection of session
- have both tables segmented on referer_id and id correspondingly.

And having more nodes in cluster do help.

answered Sep 19 '22 19:09

Sergey Cherepanov

Related questions
                            
                                Custom Date/Time formatting in SQL Server
                            
                                How do I to insert data into an SQL table using C# as well as implement an upload function?
                            
                                MySQL Select First Day of Year and Month
                            
                                trim left characters in sql server?
                            
                                Simulate a left join without using "left join"
                            
                                Opposite of SELECT TOP?
                            
                                replace 0 with null in mysql
                            
                                MySQL create time and update time timestamp
                            
                                SQLite: autoincrement primary key questions
                            
                                Access-SQL: Inner Join with multiple tables
                            
                                Append Results from two queries and output as a single table
                            
                                IsEmpty function like ISNULL in SQL Server?
                            
                                Why in SQL NULL can't match with NULL?
                            
                                Why do we care about data types?
                            
                                mySQL SELECT upcoming birthdays
                            
                                Laravel select records older than 5 minutes?
                            
                                SQL Server Object Explorer in VS2015 Professional not showing Azure database
                            
                                SQL OpenXML Multiple Tag Issue
                            
                                QSqlField name() method returns ""
                            
                                Slick 3.1 - Printing SQL from DBIOAction (insert statements)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Vertica and joins

Tags:

sql

join

vertica

user997904

People also ask

1 Answers

Sergey Cherepanov

Recent Activity

Donate For Us