I'm breaking my head over how to do this one in SQL. I have a table:
| User_id | Question_ID | Answer_ID |
| 1 | 1 | 1 |
| 1 | 2 | 10 |
| 2 | 1 | 2 |
| 2 | 2 | 11 |
| 3 | 1 | 1 |
| 3 | 2 | 10 |
| 4 | 1 | 1 |
| 4 | 2 | 10 |
It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})
I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.
For example, For the above table:
For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
For Answer_ID = 2 - (Question 2 - Answer ID 11)
Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?
Though @rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.
You need a query to find out the most common answers to questions:
SELECT
question_id,
answer_id,
COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC
This would return a table where for each question_id we output counts in descending order.
| 1 | 1 | 3 |
| 1 | 2 | 1 |
| 2 | 10 | 3 |
| 2 | 11 | 1 |
And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.
In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
SELECT
question_id, answer_id, cnt,
IF(question_id=@q_id, NULL, @q_id:=question_id) as v
FROM (
SELECT
question_id, answer_id, COUNT(*) as cnt
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, 3 DESC) cnts
JOIN (
SELECT @q_id:=-1
) as init;
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
| 1 | 1 | 3 | 1 |
| 1 | 2 | 1 |(null)|
| 2 | 10 | 3 | 2 |
| 2 | 11 | 1 |(null)|
Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:
SELECT
question_id, answer_id
FROM (
SELECT
question_id, answer_id
FROM user_answers
GROUP BY 1, 2
ORDER BY 1, COUNT(*) DESC) cnts
JOIN (
SELECT @q_id:=-1
) as init
WHERE IF(question_id=@q_id, NULL, @q_id:=question_id) IS NOT NULL;
The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
CREATE TABLE user_answers (
user_id INTEGER,
question_id INTEGER,
answer_id INTEGER,
UNIQUE INDEX (question_id, answer_id, user_id)
) engine=InnoDB;
Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.
Do you want a fish? Or do you want to learn how to fish?
Your question seems to have multiple steps.
Fetch info about "questions by users with the given answer". Devise this SELECT
and imagine that the results form a new table.
Apply the "OTHER" restriction. This is probably a minor AND ... != ...
added to SELECT #1
.
Now find the "most common answer". This probably involves ORDER BY COUNT(*) DESC LIMIT 1
. It is likely to
use a derived table:
SELECT ...
FROM ( select#2 )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With