What is the difference between "TOP" and "SAMPLE" in TeraData SQL? Are they the same?
The SAMPLE command will give DIFFERENT results each time you run it. This will give the first 100 rows of the table. The TOP command will give you THE SAME results each time you run it. Thanks for contributing an answer to Stack Overflow!
If this query is the only thing running on your system, TOP may appear to always give you exactly the same answer, but that behavior is NOT guaranteed. SAMPLE, as you have observed, does extra processing to try to randomize the result set yet maintain the same approximate distribution.
1. The QUALIFY clause with the RANK or ROW_NUMBER ordered analytical functions returns the same results as the TOP n operator. 2. For best performance, use the TOP option instead of the QUALIFY clause with RANK or ROW_NUMBER.
The DBQL metrics show that by far, the Top 10 with no order is the least resource-intensive. I had about a 99% drop in I/O & CPU just changing from SAMPLE to TOP. So if your goal is purely efficiency, then TOP without the Order by is the clear winner going by TD's DBQL metrics. Can you add the DBQL metrics?
From TOP vs SAMPLE:
TOP 10 means "first 10 rows in sorted order". If you don't have an ORDER BY, then by extension it will be interpreted as asking for "ANY 10 rows" in any order. The optimizer is free to select the cheapest plan it can find and stop processing as soon as it has found enough rows to return.
If this query is the only thing running on your system, TOP may appear to always give you exactly the same answer, but that behavior is NOT guaranteed.
SAMPLE, as you have observed, does extra processing to try to randomize the result set yet maintain the same approximate distribution. At a very simple level, for example, it could pick a random point at which to start scanning the table and a number of rows to skip between rows that are returned.
‘Sample’ command:
Sel * from tablename
sample 100
That will give you a sample of 100 different records from the table. The SAMPLE command will give DIFFERENT results each time you run it.
TOP command:
sel top 100 * from tablename;
This will give the first 100 rows of the table. The TOP command will give you THE SAME results each time you run it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With