Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much is performance improved when using LIMIT in a SQL sentence?

Tags:

sql

database

take

Let's suppose I have a table in my database with 1.000.000 records.

If I execute:

SELECT * FROM [Table] LIMIT 1000

Will this query take the same time as if I have that table with 1000 records and just do:

SELECT * FROM [Table]

?

I'm not looking for if it will take exactly the same time. I just want to know if the first one will take much more time to execute than the second one.

I said 1.000.000 records, but it could be 20.000.000. That was just an example.

Edit:
Of course that when using LIMIT and without using it in the same table, the query built using LIMIT should be executed faster, but I'm not asking that...

To make it generic:

Table1: X records
Table2: Y records

(X << Y)

What I want to compare is:

SELECT * FROM Table1

and

SELECT * FROM Table2 LIMIT X

Edit 2:
Here is why I'm asking this:

I have a database, with 5 tables and relationships between some of them. One of those tables will (I'm 100% sure) contain about 5.000.000 records. I'm using SQL Server CE 3.5, Entity Framework as the ORM and LINQ to SQL to make the queries.

I need to perform basically three kind of non-simple queries, and I was thinking about showing to the user a limit of records (just like lot of websites do). If the user wants to see more records, the option he/she has is to restrict more the search.

So, the question came up because I was thinking about doing this (limiting to X records per query) or if storing in the database only X results (the recent ones), which will require to do some deletions in the database, but I was just thinking...

So, that table could contain 5.000.000 records or more, and what I don't want is to show the user 1000 or so, and even like this, the query still be as slow as if it would be returning the 5.000.000 rows.

like image 301
Oscar Mederos Avatar asked Apr 19 '11 04:04

Oscar Mederos


People also ask

Does limit in SQL improve performance?

Yes, you will notice a performance difference when dealing with the data. One record takes up less space than multiple records.

What does the limit clause do in SQL?

The LIMIT clause is used to specify the number of records to return. The LIMIT clause is useful on large tables with thousands of records. Returning a large number of records can impact performance.


2 Answers

TAKE 1000 from a table of 1000000 records - will be 1000000/1000 (= 1000) times faster because it only needs to look at (and return) 1000/1000000 records. Since it does less, it is naturally faster.

The result will be pretty (pseudo-)random, since you haven't specified any order in which to TAKE. However, if you do introduce an order, then one of two below becomes true:

  1. The ORDER BY clause follows an index - the above statement is still true.
  2. The ORDER BY clause cannot use any index - it will be only marginally faster than without the TAKE, because
    • it has to inspect ALL records, and sort by ORDER BY
    • deliver only a subset (TAKE count)
    • so it is not faster in the first step, but the 2nd step involves less IO/network than ALL records

If you TAKE 1000 records from a table of 1000 records, it will be equivalent (with little significant differences) to TAKE 1000 records from 1 billion, as long as you are following the case of (1) no order by, or (2) order by against an index

like image 110
RichardTheKiwi Avatar answered Sep 29 '22 12:09

RichardTheKiwi


Assuming both tables are equivalent in terms of index, row-sizing and other structures. Also assuming that you are running that simple SELECT statement. If you have an ORDER BY clause in your SQL statements, then obviously the larger table will be slower. I suppose you're not asking that.

If X = Y, then obviously they should run in similar speed, since the query engine will be going through the records in exactly the same order -- basically a table scan -- for this simple SELECT statement. There will be no difference in query plan.

If Y > X only by a little bit, then also similar speed.

However, if Y >> X (meaning Y has many many more rows than X), then the LIMIT version MAY be slower. Not because of query plan -- again should be the same -- but simply because that the internal structure of data layout may have several more levels. For example, if data is stored as leafs on a tree, there may be more tree levels, so it may take slightly more time to access the same number of pages.

In other words, 1000 rows may be stored in 1 tree level in 10 pages, say. 1000000 rows may be stored in 3-4 tree levels in 10000 pages. Even when taking only 10 pages from those 10000 pages, the storage engine still has to go through 3-4 tree levels, which may take slightly longer.

Now, if the storage engine stores data pages sequentially or as a linked list, say, then there will be no difference in execution speed.

like image 23
Stephen Chung Avatar answered Sep 29 '22 11:09

Stephen Chung