Most common values for a group dependent on a select query

Question

I'm breaking my head over how to do this one in SQL. I have a table:

| User_id | Question_ID | Answer_ID |
|    1    |     1       |    1      |
|    1    |     2       |    10     |
|    2    |     1       |    2      |
|    2    |     2       |    11     |
|    3    |     1       |    1      |
|    3    |     2       |    10     |
|    4    |     1       |    1      |
|    4    |     2       |    10     |

It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})

I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.

For example, For the above table:

For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
                       For Answer_ID = 2 - (Question 2 - Answer ID 11)

Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?

newtover · Accepted Answer

Though @rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.

You need a query to find out the most common answers to questions:

SELECT 
  question_id, 
  answer_id, 
  COUNT(*) as cnt 
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC

This would return a table where for each question_id we output counts in descending order.

| 1 |  1 | 3 |
| 1 |  2 | 1 |
| 2 | 10 | 3 |
| 2 | 11 | 1 |

And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.

In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
```
SELECT 
  question_id, answer_id, cnt,
  IF(question_id=@q_id, NULL, @q_id:=question_id) as v
FROM (
  SELECT 
     question_id, answer_id, COUNT(*) as cnt 
  FROM user_answers
  GROUP BY 1, 2
  ORDER BY 1, 3 DESC) cnts
JOIN (
  SELECT @q_id:=-1
) as init;
```
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
```
| 1 |  1 | 3 |    1 |
| 1 |  2 | 1 |(null)|
| 2 | 10 | 3 |    2 |
| 2 | 11 | 1 |(null)|
```

Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:

SELECT 
  question_id, answer_id
FROM (
  SELECT 
    question_id, answer_id
  FROM user_answers
  GROUP BY 1, 2
  ORDER BY 1, COUNT(*) DESC) cnts
JOIN (
  SELECT @q_id:=-1
) as init
WHERE IF(question_id=@q_id, NULL, @q_id:=question_id) IS NOT NULL;

The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
```
CREATE TABLE user_answers (
  user_id INTEGER,
  question_id INTEGER,
  answer_id INTEGER,
  UNIQUE INDEX (question_id, answer_id, user_id) 
) engine=InnoDB;
```

Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.

Rick James · Answer

Do you want a fish? Or do you want to learn how to fish?

Your question seems to have multiple steps.

Fetch info about "questions by users with the given answer". Devise this SELECT and imagine that the results form a new table.
Apply the "OTHER" restriction. This is probably a minor AND ... != ... added to SELECT #1.
Now find the "most common answer". This probably involves ORDER BY COUNT(*) DESC LIMIT 1. It is likely to

use a derived table:

SELECT ...
    FROM ( select#2 )

Most common values for a group dependent on a select query

Tags:

mysql

StationaryTraveller

2 Answers

newtover

Rick James

Recent Activity

Donate For Us

Most common values for a group dependent on a select query

Tags:

mysql

StationaryTraveller

2 Answers

newtover

Rick James

Related questions

Recent Activity

Donate For Us