I have a table T1 with 60 rows and 5 columns: ID1, ID2, info1, info2, info3.
I have a table T2 with 1.2 million rows and another 5 columns: ID3, ID2, info4, info5, info6.
I want to get (ID1, ID2, info4, info5, info6) from all the rows where the ID2s match up. Currently my query looks like this:
SELECT T1.ID1, T2.ID2,
T2.info4, T2.info5, T2.info6
FROM T1, T2
WHERE T1.ID2 = T2.ID2;
This takes about 15 seconds to run. My question is - should it take that long, and if not, how can I speed it up? I figure it shouldn't since T1 is so small.
I asked PostgreSQL to EXPLAIN the query, and it says that it hashes T2, then hash joins that hash with T1. It seems hashing T2 is what takes so long. Is there any way to write the query so it doesn't have to hash T2? Or, is there a way to have it cache the hash of T2 so it doesn't re-do it? The tables will only be updated every few days.
If it makes a difference, T1 is a temporary table created earlier in the session.
However, in the context of the question, a single large query will be faster that, let's say -in the worse possible scenario- a SELECT inside a programming loop (no matter the RDBMS used).
Method 1: Relational Algebra Relational algebra is the most common way of writing a query and also the most natural way to do so. The code is clean, easy to troubleshoot, and unsurprisingly, it is also the most efficient way to join two tables.
It should not take that long :)
Creating an index on T2( ID2 )
should improve the performance of your query:
CREATE INDEX idx_t2_id2 ON t2 (id2);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With