Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails complex query to count unique records based on truth table

Using Rails. I have the following code:

class TypeOfBlock < ActiveRecord::Base
  has_and_belongs_to_many :patients
end

class Patient < ActiveRecord::Base
  has_and_belongs_to_many :type_of_blocks, dependent: :destroy
end

With these sets of tables:

╔══════════════╗
║type_of_blocks║
╠══════╦═══════╣
║ id   ║ name  ║
╠══════╬═══════╣
║  1   ║ UP    ║
║  2   ║ LL    ║
║  3   ║ T     ║
╚══════╩═══════╝

╔═══════════════════════════════╗
║    patients_type_of_blocks    ║ 
╠══════════════════╦════════════╣
║ type_of_block_id ║ patient_id ║
╠══════════════════╬════════════╣
║                1 ║          1 ║
║                1 ║          2 ║
║                2 ║          2 ║
║                3 ║          3 ║
║                2 ║          4 ║
║                1 ║          5 ║
║                1 ║          6 ║
║                2 ║          6 ║
║                3 ║          6 ║
╚══════════════════╩════════════╝ 

I want to count the number of unique patient depends on type of block combination, here's the expected result:

# Expected results (just like a truth table)
UP (patient with type_of_block_id 1 only) = 2 patient
UP + LL (patient with type_of_block_ids 1 and 2) = 1 patient
UP + T (patient with type_of_block_ids 1 and 3) = 0 patient
LL (patient with type_of_block_id 2 only) = 1 patient
LL + T (patient with type_of_block_ids 2 and 3) = 0 patient
T (patient with type_of_block_id 3 only) = 1 patient
UP + LL + T (patient with type_of_block_ids 1, 2 and 3) = 1 patient

I've tried to join the tables like below:

up_ll =
  TypeOfBlock.
    joins("join patients_type_of_blocks on patients_type_of_blocks.type_of_block_id = type_of_blocks.id").
    where("patients_type_of_blocks.type_of_block_id = 1 and patients_type_of_blocks.type_of_block_id = 2").
    size

But there are just too much complexity, and the number is wrong. I wanted to try raw SQL, but Rails 4 deprecated it and requires me to do ModelClass.find_by_sql.

How can I generate the above expected results?

like image 514
Victor Avatar asked Mar 30 '16 16:03

Victor


2 Answers

The only solution that comes to my mind is to use raw SQL and leverage the group_concat function, as shown here.

The SQL needed is this:

SELECT
  combination,
  count(*) as cnt
FROM (
       SELECT
         ptb.patient_id,
         group_concat(tb.name ORDER BY tb.name) AS combination
       FROM type_of_blocks tb
       INNER JOIN patients_type_of_blocks ptb ON ptb.type_of_block_id = tb.id
       GROUP BY ptb.patient_id) patient_combinations
GROUP BY combination;

The inner select groups by the patients and selects the combinations of block types each of the patient has. The outer select then simply counts the patients in each combination.

The query returns the following (see the SQL fiddle):

combination     cnt
LL              1
LL,T,UP         1
LL,UP           1
T               1
UP              2

As you can see, the query does not return zero counts, this has to be solved in ruby code (perhaps initialize a hash with all combinations with zeroes and then merge with the query counts).

To integrate this query to ruby, simply use the find_by_sql method on any model (and for example convert the results to a hash):

sql = <<-EOF
        ...the query from above...
        EOF

TypeOfBlock.find_by_sql(sql).to_a.reduce({}) { |h, u| h[u.combination] = u.cnt; h }
# => { "LL" => 1, "LL,T,UP" => 1, "LL,UP" => 1, "T" => 1, "UP" => 2 }
like image 175
Matouš Borák Avatar answered Nov 14 '22 13:11

Matouš Borák


The answer provided by BoraMa is correct. I just want to address:

As you can see, the query does not return zero counts, this has to be solved in ruby code (perhaps initialize a hash with all combinations with zeroes and then merge with the query counts).

It could be achieved by using pure MySQL:

SELECT sub.combination, COALESCE(cnt, 0) AS cnt
FROM (SELECT GROUP_CONCAT(Name ORDER BY Name SEPARATOR ' + ') AS combination
      FROM (SELECT p.Name, p.rn, LPAD(BIN(u.N + t.N * 10), size, '0') bitmap
            FROM (SELECT @rownum := @rownum + 1 rn, id, Name
                  FROM type_of_blocks, (SELECT @rownum := 0) r) p
            CROSS JOIN (SELECT 0 N UNION ALL SELECT 1 
                    UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
                    UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
                    UNION ALL SELECT 8 UNION ALL SELECT 9) u
             CROSS JOIN (SELECT 0 N UNION ALL SELECT 1 
                    UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
                    UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
                    UNION ALL SELECT 8 UNION ALL SELECT 9) t
             CROSS JOIN (SELECT COUNT(*) AS size FROM type_of_blocks) o
             WHERE u.N + t.N * 10 < POW(2, size)
             ) b
       WHERE SUBSTRING(bitmap, rn, 1) = '1'
       GROUP BY bitmap
) AS sub
LEFT JOIN (
    SELECT combination, COUNT(*) AS cnt
    FROM (SELECT ptb.patient_id,
                GROUP_CONCAT(tb.name ORDER BY tb.name SEPARATOR ' + ') AS combination
          FROM type_of_blocks tb
          JOIN patients_type_of_blocks ptb 
            ON ptb.type_of_block_id = tb.id
          GROUP BY ptb.patient_id) patient_combinations
    GROUP BY combination   
) AS sub2
  ON sub.combination = sub2.combination
ORDER BY LENGTH(sub.combination), sub.combination; 

SQLFiddleDemo

Output:

╔══════════════╦═════╗
║ combination  ║ cnt ║
╠══════════════╬═════╣
║ T            ║   1 ║
║ LL           ║   1 ║
║ UP           ║   2 ║
║ LL + T       ║   0 ║
║ T + UP       ║   0 ║
║ LL + UP      ║   1 ║
║ LL + T + UP  ║   1 ║
╚══════════════╩═════╝

How it works:

  1. Generate all possible combinations using method described by Serpiton (with slight improvements)
  2. Calculate available combinations
  3. Combine both results

To better understand how it works Postgresql version of generating all cominations:

WITH all_combinations AS (
    SELECT string_agg(b.Name ,' + ' ORDER BY b.Name) AS combination
    FROM (SELECT p.Name, p.rn, RIGHT(o.n::bit(16)::text, size) AS bitmap
          FROM (SELECT *, ROW_NUMBER() OVER(ORDER BY id)::int AS  rn
                FROM type_of_blocks )AS p
          CROSS JOIN generate_series(1, 100000) AS o(n)     
          ,LATERAL(SELECT COUNT(*)::int AS size FROM type_of_blocks) AS s
          WHERE o.n < 2 ^ size
         ) b
    WHERE SUBSTRING(b.bitmap, b.rn, 1) = '1'
    GROUP BY b.bitmap
)
SELECT sub.combination, COALESCE(sub2.cnt, 0) AS cnt
FROM all_combinations sub
LEFT JOIN (SELECT combination, COUNT(*) AS cnt
           FROM (SELECT ptb.patient_id,
                 string_agg(tb.name,' + ' ORDER BY tb.name) AS combination
                 FROM type_of_blocks tb
                 JOIN patients_type_of_blocks ptb 
                   ON ptb.type_of_block_id = tb.id
                 GROUP BY ptb.patient_id) patient_combinations
           GROUP BY combination) AS sub2
  ON sub.combination = sub2.combination
ORDER BY LENGTH(sub.combination), sub.combination; 

SqlFiddleDemo2

like image 35
Lukasz Szozda Avatar answered Nov 14 '22 12:11

Lukasz Szozda