I have the following database table with information about people, diseases, and drugs:
PERSON_T DISEASE_T DRUG_T
========= ========== ========
PERSON_ID DISEASE_ID DRUG_ID
GENDER PERSON_ID PERSON_ID
NAME DISEASE_START_DATE DRUG_START_DATE
DISEASE_END_DATE DRUG_END_DATE
From these tables, I run some statistics about which individuals have taken which drugs and had which diseases. From this I can figure out which patterns are interesting for me to delve further into. For instance, below is a simplified example of the boolean pattern I might find for disease 52:
( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR
(Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
Edit: Here is another example:
( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR
(Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
Now I want to convert this pattern into a sql query and find all the people who match this pattern.
For example, I want to find all of the people in PERSON_T who had the disease and
((who did not take drug 234 and 26 before exhibiting symptoms, but did take drug 474 before exhibiting symptoms) or
(who took drug 371 before exhibiting symptoms, but not drug 791 and 395 before exhibiting symptoms))
How would I go about translating this pattern back into the original query?
Here's my first attempt, but I get stuck on the first term:
SELECT * FROM PERSON_T, DRUG_T, DISEASE_T
WHERE DISEASE_ID = 52 AND
PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND
PERSON_T.PERSON_ID = DRUG_T.PERSON_ID AND
(DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)
I need this to work in PostgreSql, but I assume that any given answer can be translated from a given database to PostgreSql.
Response to comments
Edit I added my own answer. Can anyone come up with a simpler answer?
To me, the straightforward (if ugly) solution is to use EXISTS and NOT EXISTS clauses:
SELECT *
FROM PERSON_T INNER JOIN DISEASE_T
USING (PERSON_ID)
WHERE DISEASE_ID = 52
AND EXISTS (SELECT 1 FROM DRUG_T
WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
AND DRUG_ID = 474
AND [time condition])
AND NOT EXISTS (SELECT 1 FROM DRUG_T
WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
AND DRUG_ID = 234
AND [time condition])
...and so on. In the example, we're asking for people who have taken drug 474 but not 234. Obviously, you can group the clauses with ANDs and ORs according to what you need.
Aside: I find all caps difficult to read. I usually use uppercase for SQL keywords and lowercase for table and column names.
I have no idea how this will perform with large tables (I imagine it will be pretty lousy as date comparisons are typically pretty expensive), but here is a method that should work. It is relatively verbose, but is very easy to modify for different boolean cases.
Example 1:
SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND
(d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
(d3.person_id IS NULL OR dis.startdate < d3.startdate))
OR
((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
(d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
(d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
Example 2:
SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND
(d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
(d3.person_id IS NULL OR dis.startdate < d3.startdate))
or
((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
(d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
(d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With