Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between "TOP" and "SAMPLE" in TeraData SQL

Tags:

sql

teradata

What is the difference between "TOP" and "SAMPLE" in TeraData SQL? Are they the same?

like image 404
Steve Avatar asked Jun 16 '11 15:06

Steve


People also ask

What is the difference between sample and top command in SQL?

The SAMPLE command will give DIFFERENT results each time you run it. This will give the first 100 rows of the table. The TOP command will give you THE SAME results each time you run it. Thanks for contributing an answer to Stack Overflow!

Should I use top or sample for my queries?

If this query is the only thing running on your system, TOP may appear to always give you exactly the same answer, but that behavior is NOT guaranteed. SAMPLE, as you have observed, does extra processing to try to randomize the result set yet maintain the same approximate distribution.

What is the difference between top and qualify in SQL?

1. The QUALIFY clause with the RANK or ROW_NUMBER ordered analytical functions returns the same results as the TOP n operator. 2. For best performance, use the TOP option instead of the QUALIFY clause with RANK or ROW_NUMBER.

Should I use top or sample for dbql metrics?

The DBQL metrics show that by far, the Top 10 with no order is the least resource-intensive. I had about a 99% drop in I/O & CPU just changing from SAMPLE to TOP. So if your goal is purely efficiency, then TOP without the Order by is the clear winner going by TD's DBQL metrics. Can you add the DBQL metrics?


2 Answers

From TOP vs SAMPLE:

TOP 10 means "first 10 rows in sorted order". If you don't have an ORDER BY, then by extension it will be interpreted as asking for "ANY 10 rows" in any order. The optimizer is free to select the cheapest plan it can find and stop processing as soon as it has found enough rows to return.

If this query is the only thing running on your system, TOP may appear to always give you exactly the same answer, but that behavior is NOT guaranteed.

SAMPLE, as you have observed, does extra processing to try to randomize the result set yet maintain the same approximate distribution. At a very simple level, for example, it could pick a random point at which to start scanning the table and a number of rows to skip between rows that are returned.

like image 110
Yuck Avatar answered Oct 12 '22 22:10

Yuck


‘Sample’ command:

Sel * from tablename
sample 100

That will give you a sample of 100 different records from the table. The SAMPLE command will give DIFFERENT results each time you run it.

TOP command:

sel top 100 * from tablename;

This will give the first 100 rows of the table. The TOP command will give you THE SAME results each time you run it.

like image 31
Samihan Jawalkar Avatar answered Oct 12 '22 22:10

Samihan Jawalkar