I have a query in Oracle 11g like this: <pre class="prettyprint"><code>SELECT * FROM CATAT, CG, CCSD WHERE CATAT.ID = 1007642 AND CG.C_ID = CATAT.ID AND CATAT.IS_PARENT = 1 AND CCSD.G_ID = CG.ID </code></pre> The query, in this instance, comes back with zero rows, and does so pretty much instantly. However, if I change it to this: <pre class="prettyprint"><code>SELECT COUNT(*) AS ROW_COUNT FROM CATAT, CG, CCSD WHERE CATAT.ID = 1007642 AND CG.C_ID = CATAT.ID AND CATAT.IS_PARENT = 1 AND CCSD.G_ID = CG.ID </code></pre> It NEVER comes back - I have left the query running for over 5 minutes and it still hasn't finished. In fact, anything except SELECT * takes an extremely long time to run. E.g. <code>SELECT CG.ID FROM...</code>, or <code>SELECT CATAT.* FROM...</code> The only thing unusual about this query is that the CCSD table has millions of rows of data in it. There is an index on <code>CCSD.G_ID</code>, so it can't be a lack of indexes. I just don't understand why a query that returns zero rows instantly with a <code>SELECT *</code> should take so long if you do anything other than that? Can anyone shed some light on this? <h3>UPDATE</h3> Here is the explain plan for the <code>SELECT * FROM...</code> query: <img src="https://i.stack.imgur.com/aueU6.png" alt="explain plan 1"> Here is the explain plan for the <code>SELECT COUNT(*) FROM...</code> query: <img src="https://i.stack.imgur.com/LN2Mc.png" alt="enter image description here">

If you are evaluating the performance of your query in some SQL development environment like Toad or SQL Developer it's not a true comparison. Most IDE's fetch the first n rows (usually 50 rows). <s>by wrapping your query with a</s> <pre class="prettyprint"><code>SELECT * FROM (your query) WHERE ROWNUM <= 50 </code></pre> Often with a stopkey hint. This means the DB only fetches the first 50 rows and stops. However, your <code>SELECT COUNT(*) FROM ...</code> is forcing the DB to actually count each and every row the query returns and that takes as long as it takes. Edit: I was thinking of another Oracle product (Apex) when I stated your SQL Developer query is wrapped in a rownum query. That is incorrect. Apparently SQL Developer sets an arraysize for your session per your preference. Nevertheless, fetching 50 rows and stopping will always be faster than forcing a count of all rows. Edit 2: Fair enough, I had thought I understood this question and the SQL Developer fetch size, but nope. I'll leave my answer here as a cautionary example of assumptions.

What happens if you run this query instead? <pre class="prettyprint"><code>SELECT COUNT(*) AS ROW_COUNT FROM CATAT WHERE CATAT.ID = 1007642 AND CATAT.IS_PARENT = 1 AND EXISTS(SELECT 1 FROM CG WHERE CG.C_ID = CATAT.ID AND EXISTS(SELECT 1 FROM CCSD WHERE CCSD.G_ID = CG.ID)) </code></pre> I believe the problem is in the double join you have in the query, Hope it helps! EDIT: To elaborate a little bit more, in your original query: <pre class="prettyprint"><code>SELECT COUNT(*) AS ROW_COUNT **FROM CATAT, CG, CCSD** WHERE CATAT.ID = 1007642 AND CG.C_ID = CATAT.ID AND CATAT.IS_PARENT = 1 AND CCSD.G_ID = CG.ID </code></pre> The second line is the problem, when you list additional tables in a from clause in Oracle it means you are writing an implicit join IF and only if you list and match all the primary keys on each table with another column on a different table. Depending on the primary key components you add on the where clause it will result on a regular inner join (if you match all the primary key columns) or it can result on something similar to a cartesian product which I believe is the case by the plans you posted in the images, I can see a merge join with option cartesian in the query plan. All this means that the database is generating a really big table, and the number of rows in that table is all the rows in CCSD * all the rows in CG * all the rows in CATAT (CCSD has a few millions as you stated so this results in the slowness you perceive) and after that is trying to traverse this temporary table checking the filters you have in place. This problem is happening because the original query is not optimized for the task, and the one I posted is. What I did was read you query to have an idea of what are you trying to do, you are trying to list a subset of the table CATAT with a specific ID and with IS_PARENT = 1, but you only want to list those ones whose ID (CATAT.ID) is on (or exists on) the table CG AND in the table CCSD. When writing the query I tried to use the same cascading you have in the conditionals, but the query I posted originally can be written like this also: <pre class="prettyprint"><code>SELECT COUNT(*) AS ROW_COUNT FROM CATAT WHERE CATAT.ID = 1007642 AND CATAT.IS_PARENT = 1 AND EXISTS(SELECT 1 FROM CG WHERE CG.C_ID = CATAT.ID ) AND EXISTS(SELECT 1 FROM CCSD WHERE CCSD.G_ID = CATAT.ID) </code></pre> Now this query does exactly the same as the original one you wrote, but without a join. To solve this query the database traverses the table CATAT matching by ID and IS_PARENT(having an index makes this really fast), once a row matches the first two conditions the databse tries to find an existing record by C_ID on the table CG (again really fast if you have an index) and after that it tries to do the same with the table CCSD by ID. This last 2 searches are in cascade in the first query I posted, but the idea is the same: your query is running slow because is creating a cartesian product (maybe optimized, but still resulting in a large number of rows) while the one I wrote is just traversing tables by ID (no merges), which probably already have indexes in those columns and that's why it is running fast.

Oracle 11g - why is SELECT COUNT() infinitely slower than SELECT ?

Tags:

sql

oracle

oracle11g

I have a query in Oracle 11g like this:

SELECT *
FROM CATAT, CG, CCSD
WHERE CATAT.ID = 1007642
AND CG.C_ID = CATAT.ID
AND CATAT.IS_PARENT = 1
AND CCSD.G_ID = CG.ID

The query, in this instance, comes back with zero rows, and does so pretty much instantly. However, if I change it to this:

SELECT COUNT(*) AS ROW_COUNT
FROM CATAT, CG, CCSD
WHERE CATAT.ID = 1007642
AND CG.C_ID = CATAT.ID
AND CATAT.IS_PARENT = 1
AND CCSD.G_ID = CG.ID

It NEVER comes back - I have left the query running for over 5 minutes and it still hasn't finished. In fact, anything except SELECT * takes an extremely long time to run. E.g. SELECT CG.ID FROM..., or SELECT CATAT.* FROM...

The only thing unusual about this query is that the CCSD table has millions of rows of data in it. There is an index on CCSD.G_ID, so it can't be a lack of indexes.

I just don't understand why a query that returns zero rows instantly with a SELECT * should take so long if you do anything other than that? Can anyone shed some light on this?

UPDATE

Here is the explain plan for the SELECT * FROM... query: explain plan 1

Here is the explain plan for the SELECT COUNT(*) FROM... query: enter image description here

344

asked Oct 28 '13 17:10

user1578653

2 Answers

If you are evaluating the performance of your query in some SQL development environment like Toad or SQL Developer it's not a true comparison. Most IDE's fetch the first n rows (usually 50 rows). ~~by wrapping your query with a~~

SELECT * FROM (your query) WHERE ROWNUM <= 50

Often with a stopkey hint. This means the DB only fetches the first 50 rows and stops. However, your SELECT COUNT(*) FROM ... is forcing the DB to actually count each and every row the query returns and that takes as long as it takes.

Edit: I was thinking of another Oracle product (Apex) when I stated your SQL Developer query is wrapped in a rownum query. That is incorrect. Apparently SQL Developer sets an arraysize for your session per your preference. Nevertheless, fetching 50 rows and stopping will always be faster than forcing a count of all rows.

Edit 2: Fair enough, I had thought I understood this question and the SQL Developer fetch size, but nope. I'll leave my answer here as a cautionary example of assumptions.

answered Oct 01 '22 05:10

Wolf

What happens if you run this query instead?

SELECT COUNT(*) AS ROW_COUNT
FROM CATAT
WHERE CATAT.ID = 1007642
AND CATAT.IS_PARENT = 1
AND EXISTS(SELECT 1 FROM CG WHERE CG.C_ID = CATAT.ID AND EXISTS(SELECT 1 FROM CCSD WHERE CCSD.G_ID = CG.ID))

I believe the problem is in the double join you have in the query,

Hope it helps!

EDIT:

To elaborate a little bit more, in your original query:

SELECT COUNT(*) AS ROW_COUNT
**FROM CATAT, CG, CCSD**
WHERE CATAT.ID = 1007642
AND CG.C_ID = CATAT.ID
AND CATAT.IS_PARENT = 1
AND CCSD.G_ID = CG.ID

The second line is the problem, when you list additional tables in a from clause in Oracle it means you are writing an implicit join IF and only if you list and match all the primary keys on each table with another column on a different table. Depending on the primary key components you add on the where clause it will result on a regular inner join (if you match all the primary key columns) or it can result on something similar to a cartesian product which I believe is the case by the plans you posted in the images, I can see a merge join with option cartesian in the query plan.

All this means that the database is generating a really big table, and the number of rows in that table is all the rows in CCSD * all the rows in CG * all the rows in CATAT (CCSD has a few millions as you stated so this results in the slowness you perceive) and after that is trying to traverse this temporary table checking the filters you have in place.

This problem is happening because the original query is not optimized for the task, and the one I posted is.

What I did was read you query to have an idea of what are you trying to do, you are trying to list a subset of the table CATAT with a specific ID and with IS_PARENT = 1, but you only want to list those ones whose ID (CATAT.ID) is on (or exists on) the table CG AND in the table CCSD. When writing the query I tried to use the same cascading you have in the conditionals, but the query I posted originally can be written like this also:

SELECT COUNT(*) AS ROW_COUNT
FROM CATAT
WHERE CATAT.ID = 1007642
AND CATAT.IS_PARENT = 1
AND EXISTS(SELECT 1 FROM CG WHERE CG.C_ID = CATAT.ID )
AND EXISTS(SELECT 1 FROM CCSD WHERE CCSD.G_ID = CATAT.ID)

Now this query does exactly the same as the original one you wrote, but without a join. To solve this query the database traverses the table CATAT matching by ID and IS_PARENT(having an index makes this really fast), once a row matches the first two conditions the databse tries to find an existing record by C_ID on the table CG (again really fast if you have an index) and after that it tries to do the same with the table CCSD by ID. This last 2 searches are in cascade in the first query I posted, but the idea is the same: your query is running slow because is creating a cartesian product (maybe optimized, but still resulting in a large number of rows) while the one I wrote is just traversing tables by ID (no merges), which probably already have indexes in those columns and that's why it is running fast.

answered Oct 04 '22 05:10

Sergio Ayestarán

Related questions
                            
                                Algorithm for counting common group memberships with big data
                            
                                SQL count how many times a value appears in multiple columns?
                            
                                how to call a stored procedure in where clause of SQL
                            
                                How to select distinct year from a datetime column and add the result to a comboBox in C#?
                            
                                SQL - How to do a group by without having to pass all the columns from the select?
                            
                                Powershell Script using ExecuteNonQuery() throws exception "Incorrect syntax near 's'."
                            
                                How to group by week (7 days) in SQL Server
                            
                                Insert character into SQL string
                            
                                Value of lastrowid after "insert or ignore"
                            
                                SQL query find missing consecutive numbers
                            
                                mysql select query optimization and how limit works in mysql
                            
                                Linq SELECT with ExecuteQuery
                            
                                How to wrap my query output with single quotation
                            
                                Combine date and time in SQL Server in SELECT query
                            
                                How to select and update in one query?
                            
                                selectionArgs in SQLiteQueryBuilder doesn't work with integer values in columns
                            
                                JOOQ ignoring database columns with default values
                            
                                Limiting the number of records in a Sqlite DB
                            
                                best way to insert data using dephi in sql server 2008
                            
                                SQL Server Execute Query Every Interval

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Oracle 11g - why is SELECT COUNT() infinitely slower than SELECT ?

Tags:

sql

oracle

oracle11g

UPDATE

user1578653

People also ask

2 Answers

Wolf

Sergio Ayestarán

Recent Activity

Donate For Us

Oracle 11g - why is SELECT COUNT(*) infinitely slower than SELECT *?

Tags:

sql

oracle

oracle11g

UPDATE

user1578653

People also ask

2 Answers

Wolf

Sergio Ayestarán

Related questions

Recent Activity

Donate For Us

Oracle 11g - why is SELECT COUNT() infinitely slower than SELECT ?