Say if I have a table:
CREATE TABLE T
(
TableDTM TIMESTAMP NOT NULL,
Code INT NOT NULL
);
And I insert some rows:
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:00:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:10:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:20:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:40:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:50:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:00:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:10:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:20:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:40:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:50:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:00:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:10:00', 3);
So I end up with a table similar to:
2011-01-13 10:00:00, 5
2011-01-13 10:10:00, 5
2011-01-13 10:20:00, 5
2011-01-13 10:30:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:00:00, 1
2011-01-13 11:10:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:40:00, 5
2011-01-13 11:50:00, 3
2011-01-13 12:00:00, 3
2011-01-13 12:10:00, 3
How can I select the first date of each set of identical numbers, so I end up with this:
2011-01-13 10:00:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:50:00, 3
I've been messing about with sub queries and the like for most of the day and for some reason I can't seem to crack it. I'm sure there's a simple way somewhere!
I would probably want to exclude the 0's from the results, but that's not important for now..
We could use FIRST_VALUE() in SQL Server to find the first value from any table. FIRST_VALUE() function used in SQL server is a type of window function that results in the first value in an ordered partition of the given data set.
Searching from the start of a string expression. This example returns the first location of the string is in string This is a string , starting from position 1 (the first character) of This is a string . SELECT CHARINDEX('is', 'This is a string'); Here is the result set.
To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).
The first () function is used to return the first row of any table.
I'm sure there's a simple way somewhere!
Yes, there is. But first, two Issues.
The table is not a Relational Database table. It does not have an unique key, which is demanded by the RM and Normalisation (specifically that each row must have an unique identifier; not necessarily a PK). Therefore SQL, a standard language, for operating on Relational Database tables, cannot perform basic operations on it.
So the question really is SQL to find the first occurrence of sets of data in a non-relational Heap.
Now if your question was SQL to find the first occurrence of sets of data in a Relational table, implying of course some unique row identifier, that would be (a) easy in SQL, and (b) fast in any flavour of SQL ...
The question is very generic (no complaint). But many of these specific needs are usually applied within a larger context, and the context has requirements which are absent from the specification here. Generally the need is for a simple Subquery (but in Oracle use a Materialised View to avoid the subquery). And the subquery, too, depends on the outer context, the outer query. Therefore the answer to the small generic question will not contain the answer to the actual specific need.
Anyway, I do not wish to avoid the question. Why don't we use a real world example, rather than a simple generic one; and find the first or last occurrence, or minimum or maximum value, of a set of data, within another set of data, in a Relational table ?
Main Query
Let's use the ▶Data Model◀ from your previous question.
Report all Alerts
since a certain date, with the peak Value for the duration, that are not Acknowledged
Since you will be using exactly the same technique (with different table and column names) for all your temporal and History requirements, you need to fully understand the basic construct of a Subquery, and its different applications.
Note that you have, not only a pure 5NF Database, with Relational Identifiers (composite keys), you have full Temporal capability throughout, and the temporal requirement is rendered without breaking 5NF (No Update Anomalies), which means the
ValidToDateTime
for periods and durations is derived, and not duplicated in data. Point is, that complicates things, hence this is not the best example for a tutorial on Subqueries.
First build the Outer query using minimum joins, etc, based on the structure of the result set that you need, and nothing more. It is very important that the structure of the outer query is resolved first; otherwise you will go back and forth trying to make the subquery fit the outer query, and vice versa.
Alerts
after a certain dateThe ▶SQL code◀ required is on page 1 (sorry, the SO edit features are horrible, it destroys the formatting, and the code is already formatted).
Then build the Subquery to fill each cell.
Subquery (1) Derive Alert.Value
That is a simple derived datum, select the Value
from the Reading
that generated the Alert
. The tables are related, the cardinality is 1::1, so it is a straight join on the PK.
The ▶SQL code◀ required is on page 2.
I have purposely given you a mix of joins in the Outer Query vs obtaining data via Subquery, so that you can learn (you could alternately obtain Alert.Value
via a join, but that would be even more cumbersome).
The next Subquery we need derives Alert.PeakValue
. For that we need to determine the Temporal Duration of the Alert
. We have the beginning of the Alert
Duration; we need to determine the end of the Duration, which is the next (temporally) Reading.Value
that is within range. That requires a Subquery as well, which we better handle first.
Subquery (2) Derive Alert.EndDtm
A slightly more complex Suquery to select the first Reading.ReadingDtm
, that is greater than or equal to the Alert.ReadingDtm
, that has a Reading.Value
which is less than or equal to its Sensor.UpperLimit
.
Handling 5NF Temporal Data
For handling temporal requirements in a 5NF Database (in which EndDateTime
is not stored, as is duplicate data), we work on a StartDateTime
only, and the EndDateTime
is derived: it is the next StartDateTime
. This is the Temporal notion of Duration.
EndDateTime
as simply the Next.StartDateTime
, and ignore the one millisecond issue. This.StartDateTime
and < Next.StartDateTime
.
Sensor.UpperLimit
(ie. watch for it, because both are often located in one WHERE
clause, and it is easy to mix them up or get confused).The ▶SQL code◀ required, along with test data used, is on page 3.
Subquery (3) Derive Alert.PeakValue
Now it is easy. Select the MAX(Value)
from Readings
between Alert.ReadingDtm
and Alert.EndDtm
, the duration of the Alert
.
The ▶SQL code◀ required is on page 4.
Scalar Subquery
In addition to being Correlated Subqueries, the above are all Scalar Subqueries, as they return a single value; each cell in the grid can be filled with only one value. (Non-Scalar Subqueries, that return multiple values, are quite legal, but not for the above.)
Subquery (4) Acknowledged Alerts
Ok, now that you have a handle on the above Correlated Scalar Subqueries, those that fill cells in a set, a set that is defined by the Outer query, let's look at a Subquery that can be used to constrain the Outer query. We do not really want all Alerts
(above), we want Un-Acknowledged Alerts
: the Identifiers that exist in Alert
, that do not exist in Acknowledgement
. That is not filling cells, that is changing the content of the Outer set. Of course, that means changing the WHERE
clause.
FROM
and existing WHERE
clauses. Simply add a WHERE
condition to exclude the set of Acknowledged Alerts
. 1::1 cardinality, straight Correlated join.
The ▶SQL code◀ required is on page 5.
The difference is, this is a non-Scalar Subquery, producing a set of rows (one column). We have an entire set of Alerts
(the Outer set) matched against an entire set of Acknowledgements
.
1
, because we are performing an existence check. Visualise it as a column added onto the Alert
set defined by the Outer query.WHERE NOT IN ()
is required, but again, that constructs the defined column set, then compares the two sets. Much slower.Subquery (5) Actioned Alerts
As an alternative constraint on the Outer query, for un-actioned Alerts
, instead of (4), exclude the set of Actioned Alerts
. Straight Correlated join.
The ▶SQL code◀ required is on page 5.
This code has been tested on Sybase ASE 15.0.3 using 1000 Alerts
and 200 Acknowledgements
, of different combinations; and the Readings
and Alerts
identified in the document. Zero milliseconds execution time (0.003 second resolution) for all executions.
If you need it, here is the ▶SQL Code in Text Format◀.
(6) ▶Register Alert from Reading◀
This code executes in a loop (provided), selecting new Readings
which are out-of-range, and creating Alerts
, except where applicable Alerts
already exist.
(7) ▶Load Alert From Reading◀
Given that you have a full set of test data for Reading
, this code uses a modified form of (6) to load the applicable Alerts
.
It is "simple" when you know how. I repeat, writing SQL without the ability to write Subqueries is very limiting; it is essential for handling Relational Databases, which is what SQL was designed for.
I think you can figure out the remaining queries you have.
Note, this example also happens to demonstrate the power of using Relational Identifiers, in that several tables in-between the ones we want do not have to be joined (yes! the truth is Relational Identifiers means less, not more, joins, than Id
keys). Simply follow the solid lines.
DateTime
. Imagine trying to code the above with Id
PKs, there would be two levels of processing: one for the joins (and there would be far more of them), and another for the data processing.I try to stay away from colloquial labels ("nested", "inner", etc) because they are not specific, and stick to specific technical terms. For completeness and understanding:
FROM
clause, is a Materialised View, a result set derived in one query and then fed into the FROM
clause of another query, as a "table".
A Subquery in the WHERE
clause is a Predicate Subquery, because it changes the content of the result set (that which it is predicated upon). It can return either a Scalar (one value) or non-Scalar (many values).
for Scalars, use WHERE column =
, or any scalar operator
for non-Scalars, use WHERE [NOT] EXISTS
, or WHERE column [NOT] IN
A Suquery in the WHERE
clause does not need to be Correlated; the following works just fine. Identify all superfluous appendages:SELECT [Never] = FirstName,
[Acted] = LastName
FROM User
WHERE UserId NOT IN ( SELECT DISTINCT UserId
FROM Action
)
Try this:
SELECT MIN(TableDTM) TableDTM, Code
FROM
(
SELECT T1.TableDTM, T1.Code, MIN(T2.TableDTM) XTableDTM
FROM T T1
LEFT JOIN T T2
ON T1.TableDTM <= T2.TableDTM
AND T1.Code <> T2.Code
GROUP BY T1.TableDTM, T1.Code
) X
GROUP BY XTableDTM, Code
ORDER BY 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With