I have a query in Oracle 11g like this:
SELECT *
FROM CATAT, CG, CCSD
WHERE CATAT.ID = 1007642
AND CG.C_ID = CATAT.ID
AND CATAT.IS_PARENT = 1
AND CCSD.G_ID = CG.ID
The query, in this instance, comes back with zero rows, and does so pretty much instantly. However, if I change it to this:
SELECT COUNT(*) AS ROW_COUNT
FROM CATAT, CG, CCSD
WHERE CATAT.ID = 1007642
AND CG.C_ID = CATAT.ID
AND CATAT.IS_PARENT = 1
AND CCSD.G_ID = CG.ID
It NEVER comes back - I have left the query running for over 5 minutes and it still hasn't finished. In fact, anything except SELECT * takes an extremely long time to run. E.g. SELECT CG.ID FROM...
, or SELECT CATAT.* FROM...
The only thing unusual about this query is that the CCSD table has millions of rows of data in it. There is an index on CCSD.G_ID
, so it can't be a lack of indexes.
I just don't understand why a query that returns zero rows instantly with a SELECT *
should take so long if you do anything other than that? Can anyone shed some light on this?
Here is the explain plan for the SELECT * FROM...
query:
Here is the explain plan for the SELECT COUNT(*) FROM...
query:
According to this theory, COUNT(*) takes all columns to count rows and COUNT(1) counts using the first column: Primary Key. Thanks to that, COUNT(1) is able to use index to count rows and it's much faster.
The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.
So to make SELECT COUNT(*) queries fast, here's what to do: Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.
If you are evaluating the performance of your query in some SQL development environment like Toad or SQL Developer it's not a true comparison. Most IDE's fetch the first n rows (usually 50 rows). by wrapping your query with a
SELECT * FROM (your query) WHERE ROWNUM <= 50
Often with a stopkey hint. This means the DB only fetches the first 50 rows and stops. However, your SELECT COUNT(*) FROM ...
is forcing the DB to actually count each and every row the query returns and that takes as long as it takes.
Edit: I was thinking of another Oracle product (Apex) when I stated your SQL Developer query is wrapped in a rownum query. That is incorrect. Apparently SQL Developer sets an arraysize for your session per your preference. Nevertheless, fetching 50 rows and stopping will always be faster than forcing a count of all rows.
Edit 2: Fair enough, I had thought I understood this question and the SQL Developer fetch size, but nope. I'll leave my answer here as a cautionary example of assumptions.
What happens if you run this query instead?
SELECT COUNT(*) AS ROW_COUNT
FROM CATAT
WHERE CATAT.ID = 1007642
AND CATAT.IS_PARENT = 1
AND EXISTS(SELECT 1 FROM CG WHERE CG.C_ID = CATAT.ID AND EXISTS(SELECT 1 FROM CCSD WHERE CCSD.G_ID = CG.ID))
I believe the problem is in the double join you have in the query,
Hope it helps!
EDIT:
To elaborate a little bit more, in your original query:
SELECT COUNT(*) AS ROW_COUNT
**FROM CATAT, CG, CCSD**
WHERE CATAT.ID = 1007642
AND CG.C_ID = CATAT.ID
AND CATAT.IS_PARENT = 1
AND CCSD.G_ID = CG.ID
The second line is the problem, when you list additional tables in a from clause in Oracle it means you are writing an implicit join IF and only if you list and match all the primary keys on each table with another column on a different table. Depending on the primary key components you add on the where clause it will result on a regular inner join (if you match all the primary key columns) or it can result on something similar to a cartesian product which I believe is the case by the plans you posted in the images, I can see a merge join with option cartesian in the query plan.
All this means that the database is generating a really big table, and the number of rows in that table is all the rows in CCSD * all the rows in CG * all the rows in CATAT (CCSD has a few millions as you stated so this results in the slowness you perceive) and after that is trying to traverse this temporary table checking the filters you have in place.
This problem is happening because the original query is not optimized for the task, and the one I posted is.
What I did was read you query to have an idea of what are you trying to do, you are trying to list a subset of the table CATAT with a specific ID and with IS_PARENT = 1, but you only want to list those ones whose ID (CATAT.ID) is on (or exists on) the table CG AND in the table CCSD. When writing the query I tried to use the same cascading you have in the conditionals, but the query I posted originally can be written like this also:
SELECT COUNT(*) AS ROW_COUNT
FROM CATAT
WHERE CATAT.ID = 1007642
AND CATAT.IS_PARENT = 1
AND EXISTS(SELECT 1 FROM CG WHERE CG.C_ID = CATAT.ID )
AND EXISTS(SELECT 1 FROM CCSD WHERE CCSD.G_ID = CATAT.ID)
Now this query does exactly the same as the original one you wrote, but without a join. To solve this query the database traverses the table CATAT matching by ID and IS_PARENT(having an index makes this really fast), once a row matches the first two conditions the databse tries to find an existing record by C_ID on the table CG (again really fast if you have an index) and after that it tries to do the same with the table CCSD by ID. This last 2 searches are in cascade in the first query I posted, but the idea is the same: your query is running slow because is creating a cartesian product (maybe optimized, but still resulting in a large number of rows) while the one I wrote is just traversing tables by ID (no merges), which probably already have indexes in those columns and that's why it is running fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With