I'm new to Teradata and I'm facing a problem I didn't have with the previous database I used. Basically, I'm trying to reduce the number of rows returned in subqueries inside a where clause. I had no problem doing this previously with the ROWNUM function.
My previous query was something like:
SELECT * FROM myTable
WHERE field1 = 'foo' AND field2 in(
SELECT field2 FROM anotherTable
WHERE field3 = 'bar' AND ROWNUM<100);
Since I can't use ROWNUM in TD, I've looked for equivalent functions or at least functions that would get me where I wanted even if they were'nt exactly equivalent. I found and tried : ROW_NUMBER, TOP and SAMPLE.
I tried ROW_NUMBER() but Teradata doesn't allow analytic functions in WHERE clauses. I tried TOP N but this option is not supported in a subquery. I tried SAMPLE N but it is not supported in subqueries either.
So... I have to admit I'm a bit stuck right now and was wondering if there was any solution that would allow me to limit the number of rows returned in a subquery using Teradata and that would be pretty similar to what I did up to now? Also, if there aren't any, how would it be possible to build the query differently to use it appropriately with Teradata?
Thanks!
Sometimes it is useful to limit the number of rows that are returned from a query. This can be especially useful when querying very large domains. The Teradata database provides the ability to limit the number of rows returned from a query starting at the beginning of the results using the top keyword.
In Denodo,at this moment Limit is only supported at the end of a SELECT statement to reduce the size of the result set but it cannot be used in subqueries.
We can select the very recent sample records from our table using TOP. Syntax: SELECT TOP n* FROM table_name.
A subquery is correlated when it references columns of outer tables in an enclosing or containing outer query. The expression correlated subquery comes from the explicit requirement for the use of correlation names in any correlated subquery in which the same table is referenced in both the internal and external query.
The limited usage of SAMPLE or TOP in a subquery is probably because this might be a Correlated Subquery.
But there are two workarounds.
Put SAMPLE or TOP in a Derived Table within the subquery (so this can't be correlated anymore):
SELECT * FROM myTable
WHERE field1 = 'foo'
AND field2 IN (
SELECT * FROM
( SELECT field2 FROM anotherTable -- or TOP 100
WHERE field3 = 'bar' SAMPLE 100
) AS dt
);
Or rewrite it as a join to a Derived Table:
SELECT * FROM myTable
JOIN ( SELECT DISTNCT field2 FROM anotherTable -- can't use TOP if you need DISTINCT
WHERE field3 = 'bar' SAMPLE 100
) AS dt
WHERE field1 = 'foo'
AND myTable.field2 = dt.field1;
TOP without ORDER BY is quite similar to ROWNUM. It's not random at all, but running it a 2nd time might still return a different result set.
SAMPLE is truly random, every time returning a different result.
ROW_NUMBER is also possible using QUALIFY instead of WHERE, but OLAP functions always need some ORDER BY, so is much more overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With