I have a table with around 5 million rows that look like this:
Erp_in:
corr_id varchar(50) (almost Unique)
corr_type nvarchar(1) (4 distinct values)
interface varchar(20) (around 10 distinct values)
indate DateTime
With 3 different index on (corr_id, interface and indate)
And also I have another table which I usally left join with the original table, with about 100000 rows
Erp_In_failed:
corr_id
interface
error (clob)
input (clob)
with index on (corr_id and interface)
The query I want to optimize is Simple as this:
SELECT a.corr_id, a.interface, a.indate, b.error
FROM erp_in a left join erp_in_failed b on a.corr_id = b.corr_id and a.interface = b.interface
Order by a.indate desc;
If I remove the order by, the query isn't taking that long, but ordering the data takes around 3 minutes if not more.
What can I do to optimize the query? I was thinking on Partitioning / Removing old data to history table / maybe creating a sequence primary key and order by it or anything else you have in mind...
Edit:
Execution plan says full table scan, and it's not the join that takes so long it's the order by.
Even this query takes forever:
SELECT * FROM erp_in
ORDER BY indate;
I have tried using Paging, but that doesn't work also, and takes few minutes for 20 results, maybe I am doing it wrong?
If I add WHERE clause on indate field, it uses the index, but only when it is less than 20 days old, anything other than that still uses full table scan. (even with 40 days, adding INDEX hint made the query run faster, but still not enough).
And just for the curiosity, I have a simple table with 1 million rows, and order by takes few seconds, what's the difference? is 1 million is enough to sort it in RAM?
Thanks,
You're sorting five million rows including about one hundred thousand clobs. It is unlikely you'll be able to do that in memory; if you run the query in SQL*Plus with stats on you should see the query writing out to disk a lot.
One way of improving the performance will be to add more GB to your buffer cache, but that may be an expensive option and not a quick one either.
Building a composite index on erp_in(indate desc, corr_id, interface)
could help as well, because the results from the driving table will be pre-sorted and so the ordering should take less work. Partitioning by INDATE might have a similar effect, but partitioning is a chargeable extra to the Enterprise Edition and hence is not a cheap fix (gobs more memory will probably be a lot cheaper).
Your reference to archiving old data suggests that you don't actually want to retrieve all the records. If this is the case then reducing the size of the result set with a WHERE clause would help a lot. The easiest way to tune something is not to do work in the first place.
Adding a primary key and ordering by it won't reduce the amount of effort required to to the actual sort.
"so should I partition by date? will it help without adding WHERE clause on INDATE field"
It depends. Partitioning introduces some physical organization of the table, so the rows will (at least) require less sorting. How much less depends on the granularity of the partition: partition by a range on one day and the table is pretty much already in INDATE order, partition by a range of one year and it's much less so.
However, bear in mind that partitioning is not primarily a performance option. It is an option for managing data, especially loading, and availability. In fact, it may downgrade the performance of some queries, queries which don't suit the ordering applied by the partition key.
So, should you partition by date? Not a question we can answer. Answering it requires the sort of in-depth knowledge of your system which you have and we lack. But if you have the licenses it is certainly an option you should investigate and benchmark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With