Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which approach is best for storing a list of words in mysql that will later be used for statistics?

DETAILS

I have a quiz (let’s call it quiz1). Quiz1 uses the same wordlist each time it is generated. If the user needs to, they can skip words to complete the quiz. I’d like to store those skipped words in mysql and then later perform statistics on them.

At first I was going to store the missed words in one column as a string. Each word would be separated by a comma.

|testid |       missedwords                     | score     |   userid  |
*************************************************************************
| quiz1 | wordlist,missed,skipped,words         |  59       |   1       |
| quiz2 | different,quiz,list                   |  65       |   1       |

The problem with this approach is that I want to show statistics at the end of each quiz about which words were most frequently missed by users who took quiz1.
I’m assuming that storing missed words in one column as above is inefficient for this purpose as I'd need to extract the information and then tally it -(probably tally using php- unless I stored that tallied data in a separate table).

I then thought perhaps I need to create a separate table for the missed words The advantage of the below table is that it should be easy to tally the words from the table below.

|Instance|  missed word     |
*****************************
|   1    |  wordlist        |
|   1    |  missed          |
|   1    |  skipped         |

Another approach I could create a table with tallys and update it each time quiz1 was taken.

Testid  |   wordlist|   missed| skipped| otherword|
**************************************************
Quiz1   |        1  |        1|       1| 0        |

The problem with this approach is that I would need a different table for each quiz, because each quiz will use different words. Also information is lost because only the tally is kept not the related data such which user missed which words.

Question

Which approach would you use? Why? Alternative approaches to this task are welcome. If you see any flaws in my logic please feel free to point them out.

EDIT Users will be able to retake the quiz as many times as they like. Their information will not be updated, instead a new instance would be created for each quiz they retook.

like image 635
TryHarder Avatar asked Apr 24 '13 08:04

TryHarder


3 Answers

The best way to do this is to have the word collection completely normalized. This way, analyses will be easy and fast.

quiz_words with wordID, word
quiz_skipped_words with quizID, userID, wordID

To get all the skipped words of a user:

SELECT wordID, word 
FROM quiz_words 
JOIN quiz_skipped_words USING (wordID) 
WHERE userID = ?;

You could add a group by clause to have group counts of the same word.

To get the count of a specific word:

SELECT COUNT(*) 
FROM quiz_words 
WHERE word LIKE '?';
like image 145
markus Avatar answered Nov 17 '22 03:11

markus


According to database normalization theory, second approach is better, because ideally one relational table cell should store only one value, which is atomic and unsplitable. Each word is an entity instance.

Also, I might suggest to not create Quiz-Word tables, but reserve another column in Missed-Word table for quiz, for which this word was specified, then use this column as a foreign key for Quiz table. Then you probably may avoid real time table generation (which is a "bad practice" in database design).

like image 24
BlitZ Avatar answered Nov 17 '22 05:11

BlitZ


why not have a quiz table and quiz_words table, the quiz_words table would store id,quizID,word as columns. Then for each quiz instance create records in the quiz_words table for each word the user did use.

You could then run mysql counts on the quiz_words table based on quizID and or quiz type

like image 1
MadDokMike Avatar answered Nov 17 '22 03:11

MadDokMike