Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most common values for a group dependent on a select query

Tags:

mysql

I'm breaking my head over how to do this one in SQL. I have a table:

| User_id | Question_ID | Answer_ID |
|    1    |     1       |    1      |
|    1    |     2       |    10     |
|    2    |     1       |    2      |
|    2    |     2       |    11     |
|    3    |     1       |    1      |
|    3    |     2       |    10     |
|    4    |     1       |    1      |
|    4    |     2       |    10     |

It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})

I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.

For example, For the above table:

For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
                       For Answer_ID = 2 - (Question 2 - Answer ID 11)

Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?

like image 866
StationaryTraveller Avatar asked Jul 03 '17 17:07

StationaryTraveller


2 Answers

Though @rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.

  1. You need a query to find out the most common answers to questions:

    SELECT 
      question_id, 
      answer_id, 
      COUNT(*) as cnt 
    FROM user_answers
    GROUP BY 1, 2
    ORDER BY 1, 3 DESC
    

    This would return a table where for each question_id we output counts in descending order.

    | 1 |  1 | 3 |
    | 1 |  2 | 1 |
    | 2 | 10 | 3 |
    | 2 | 11 | 1 |
    
  2. And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.

    In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.

    SELECT 
      question_id, answer_id, cnt,
      IF(question_id=@q_id, NULL, @q_id:=question_id) as v
    FROM (
      SELECT 
         question_id, answer_id, COUNT(*) as cnt 
      FROM user_answers
      GROUP BY 1, 2
      ORDER BY 1, 3 DESC) cnts
    JOIN (
      SELECT @q_id:=-1
    ) as init;
    

    Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:

    | 1 |  1 | 3 |    1 |
    | 1 |  2 | 1 |(null)|
    | 2 | 10 | 3 |    2 |
    | 2 | 11 | 1 |(null)|
    
  3. Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:

    SELECT 
      question_id, answer_id
    FROM (
      SELECT 
        question_id, answer_id
      FROM user_answers
      GROUP BY 1, 2
      ORDER BY 1, COUNT(*) DESC) cnts
    JOIN (
      SELECT @q_id:=-1
    ) as init
    WHERE IF(question_id=@q_id, NULL, @q_id:=question_id) IS NOT NULL;
    
  4. The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).

    CREATE TABLE user_answers (
      user_id INTEGER,
      question_id INTEGER,
      answer_id INTEGER,
      UNIQUE INDEX (question_id, answer_id, user_id) 
    ) engine=InnoDB;
    

Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.

like image 85
newtover Avatar answered Oct 22 '22 10:10

newtover


Do you want a fish? Or do you want to learn how to fish?

Your question seems to have multiple steps.

  1. Fetch info about "questions by users with the given answer". Devise this SELECT and imagine that the results form a new table.

  2. Apply the "OTHER" restriction. This is probably a minor AND ... != ... added to SELECT #1.

  3. Now find the "most common answer". This probably involves ORDER BY COUNT(*) DESC LIMIT 1. It is likely to

use a derived table:

SELECT ...
    FROM ( select#2 )
like image 33
Rick James Avatar answered Oct 22 '22 11:10

Rick James