Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I translate a simple boolean statement to SQL?

I have the following database table with information about people, diseases, and drugs:

PERSON_T              DISEASE_T               DRUG_T
=========             ==========              ========
PERSON_ID             DISEASE_ID              DRUG_ID
GENDER                PERSON_ID               PERSON_ID
NAME                  DISEASE_START_DATE      DRUG_START_DATE
                      DISEASE_END_DATE        DRUG_END_DATE

From these tables, I run some statistics about which individuals have taken which drugs and had which diseases. From this I can figure out which patterns are interesting for me to delve further into. For instance, below is a simplified example of the boolean pattern I might find for disease 52:

( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR 
  (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )

Edit: Here is another example:

( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR 
      (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )

Now I want to convert this pattern into a sql query and find all the people who match this pattern.
For example, I want to find all of the people in PERSON_T who had the disease and ((who did not take drug 234 and 26 before exhibiting symptoms, but did take drug 474 before exhibiting symptoms) or (who took drug 371 before exhibiting symptoms, but not drug 791 and 395 before exhibiting symptoms))

How would I go about translating this pattern back into the original query?

Here's my first attempt, but I get stuck on the first term:

SELECT * FROM PERSON_T, DRUG_T, DISEASE_T 
  WHERE DISEASE_ID = 52 AND 
    PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND 
    PERSON_T.PERSON_ID = DRUG_T.PERSON_ID  AND 
    (DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)

I need this to work in PostgreSql, but I assume that any given answer can be translated from a given database to PostgreSql.

Response to comments

  1. I fixed formatting of database tables. Thank you.
  2. I need to be able to take an arbitrary boolean statement and translate it to SQL. The boolean statements we are actually creating are much longer than the example I gave. Any new tables I create will be in a new database and need to have the same schema as the original tables. This way to the end user, they can run their same code on the new tables and it work the same as if it ran on the original tables. This is a requirement from the customer. I'm hoping I can create a view which is just a query to the original tables. If we can't get that to work, I may create a copy of the tables and filter the data as I'm copying it over to the new table. We are not using neural networks to do the analysis. We are using our own custom algorithms which scale much better than neural networks.
  3. The Disease_Start_Date is the date when the person gets the diseaes which is likely when the symptons start appearing. Disease_End_Date is when the person is recovered which is likely when symptoms disappear.
  4. Drug_start_date is when the person starts taking the drugs. Drug_end_date is when the person stops taking the drugs.

Edit I added my own answer. Can anyone come up with a simpler answer?

like image 961
Jay Askren Avatar asked Jul 09 '10 13:07

Jay Askren


Video Answer


2 Answers

To me, the straightforward (if ugly) solution is to use EXISTS and NOT EXISTS clauses:

SELECT *
FROM PERSON_T INNER JOIN DISEASE_T
     USING (PERSON_ID)
WHERE DISEASE_ID = 52
  AND EXISTS (SELECT 1 FROM DRUG_T
              WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                AND DRUG_ID = 474
                AND [time condition])
  AND NOT EXISTS (SELECT 1 FROM DRUG_T
              WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                AND DRUG_ID = 234
                AND [time condition])

...and so on. In the example, we're asking for people who have taken drug 474 but not 234. Obviously, you can group the clauses with ANDs and ORs according to what you need.

Aside: I find all caps difficult to read. I usually use uppercase for SQL keywords and lowercase for table and column names.

like image 152
Justin K Avatar answered Oct 22 '22 16:10

Justin K


I have no idea how this will perform with large tables (I imagine it will be pretty lousy as date comparisons are typically pretty expensive), but here is a method that should work. It is relatively verbose, but is very easy to modify for different boolean cases.

Example 1:

SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND
      (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
      (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
     OR
     ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
      (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
      (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))

Example 2:

SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND
      (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
      (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
     or
     ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
      (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
      (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
like image 26
sgriffinusa Avatar answered Oct 22 '22 16:10

sgriffinusa