I am working on a multiple choice online test project here i have designed database to store result but looking for more optimized way.
Requirements:
My design:
tables:
students
stud_id, name, email
tests
test_id, testname, duration
questions
que_id, question, opt1, opt2, opt3, opt4, answer, test_id
answers
stud_id, que_id, answer
By this way answers can be stored but it increase the number of records as for every question solved by student new record will be added in answers table.
e.g. One test consists 100 questions and 1000 students take that test, for every student there will be 100 records for each question and for 1000 students 100k records.
Is there any better way to do this where number of records will be less.
ASP.NET is the preferred web technology. The system carries out the examination and auto-grading for multiple choice questions which is feed into the system. KEY WORDS: Online, Examination, System, Auto-grading, Web-based.
You have done good work. As far as the data is concerned, the design is correct, but incomplete. There are two errors:
opt1…opt4
is a repeating group, that breaks 2NF. It must be placed in a separate table.Further, there seems to be no option name or descriptor, which is strange (what do you paint on the page, next to each radio button?)
If you ever add a fifth option, that is now catered for; if you have questions with less than four options, that is now catered for.
Conversely, you have a fixed set of columns, and if there are any such changes in the future, you have to change both the database and the existing code. And the code will be horrendous (extra processing instead of direct SELECTs)
answers
table has no integrity. As it stands, answers can be recorded against a question that the student was not asked, or for a test that the student did not sit. Prevention of that type of error is ordinary fare in a Relational Database, and it is not possible in a Record Filing System.In these dark days of IT, this is a common trend. People focus on the data values; they imagine the values in spreadsheet form, and they go directly to implementing object that contain those values. Instead of understanding the data and what it means.
answers(stud_id, que_id, answer)
has no meaning, no integrity, unless the context of a student_test is asserted.
test_question.
By this way answers can be stored but it increase the number of records as for every question solved by student new record will be added in answers table.
Yes. That is normal for a database.
Is there any better way to do this where number of records will be less.
For a Record Filing System, yes. For a database, no. Since you have tagged your question as database-design, I will assume that that is what you want.
A database is a collection of facts, not of records with related fields. The facts are about the real world, limited to the scope of the database and app.
It is important to determine the discrete facts that we need, because subordinate facts depend on higher-order facts. That is database design. And we Normalise the data, as we progress, as part of one and the same exercise. Normalisation has the purpose of eliminating duplication, otherwise you have Update Anomalies. And we determine Relational Keys, as we progress, again as part of one and the same exercise. Relational Keys provide the logical structure of a Relational database, ie. the logical integrity.
e.g. One test consists 100 questions and 1000 students take that test, for every student there will be 100 records for each question and for 1000 students 100k records.
Yes. But that is expressed in ISAM record-processing terms. In database terms, you cannot get around the fact that the database stores:
facts about 100 questions
facts about 1,000 students
facts about 1,000 students times the 100 choices they made
You need to get your head around two things: the large number of discrete facts; and the use of compound Keys. Both are essential to Relational databases. If either of those are missing, or you implement them with reluctance, you will not have the integrity, power, or speed of a Relational database, you will have a pre-1970's ISAM Record Filing System.
Further, the SQL platforms, and to some degree the non-SQL platforms such as MySQL, are heavily optimised for processing sets of data (not record-by-record); heavy I/O and caching; etc. If you implement the structures required for high concurrency, you will obtain even more performance.
As far as the implementation is concerned, and particularly since you are concerned about performance, there are errors. A restatement would be, the implementation should not be attempted until the data is understood and modelled correctly.
The problem across the board, is that you have added a surrogate (there is no such thing as "surrogate key", it is simply a surrogate, a physical record id). It is far to early in the modelling exercise; it hasn't progressed enough; the model is not stable, to add surrogates.
Surrogates are always an additional column plus the underlying index. Obviously that consumes resources, and has a cost on inserts and deletes.
Surrogates do not provide row uniqueness, which is demanded in a relational database.
The Relational Model demands that Keys are made up from the data. Relational Keys provide row uniqueness.
A surrogate isn't made up from the data. Therefore it is not a Relational Key, and it does not provide any of the qualities of one.
If a surrogate is used, it does not replace the Key, it is in addition to the Key. Which is why we evaluate the need for surrogates after, not before, modelling the data. It is an implementation concern, not a modelling one.
Rather than going back and forth, let me provide the proposal, and you can discuss it.
Student Test Data Model (Page 1 only, for those following the progression).
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation.
For test
and question.
I have left id
columns in, but note that you will be much better off with short, meaningful codes.
student_id
is valid because both name
and email
are too large to migrate to the child tables.
Please check the Verb Phrases carefully, they comprise a set of Predicates. The remainder of the Predicates can be determined directly from the model. If this is not clear, please ask.
See if you can determine that this is a collection of facts, and each fact is discrete precisely because other facts depend on it; that it is not a collection of records with fields that are related.
Your
answers
table has no integrity. As it stands, answers can be recorded against a question that the student was not asked, or for a test that the student did not sit. Prevention of that type of error is ordinary fare in a Relational Database, and it is not possible in a Record Filing System.
answers
table, now named student_response,
now has some integrity. A student
is registered for a test in student_test,
and the student_responses
are constrained to student_test.
Please comment/discuss.
I will add additional table subject (subject_id, subject_name) and add that subject_id in question table as FK is this okay?
Yes, by all means. But that has consequences. Some advice to make sure we do that properly, across the board:
ID
columns, read this Answer.Subject is important. It is the context in which (a) a question
exists, and (b) a test
exists. They did exist as independent items (page 1 of the DM), but now they are subordinate to subject.
The addition substantially improves data integrity.
The fact of a student registration and the fact of a student sitting for a test, are discrete and separate facts.
Gratefully, that eliminated two surrogates question_id
and test_id.
Short codes
such as CHAR(2)
are easier and more meaningful.
Note the improvement in the table names, improved clarity.
I have updated the Student Test Data Model (Page 2 only, for those following the progression).
However, that exposes something (that is why we model data, paper is cheap, many drafts are normal). If we evaluate the Predicates (readily visible in the Data Model, as detailed in the IDEF1X Notation document):
each subject_test was taken by 0-to-n student_tests
each student_test is [a taking of] 1 subject_test
each student took 0-to-n student_tests
each student_test is taken by 1 student
those Predicates are not accurate. A student
can sit for a test
in any subject.
Given the new subject
table, I would think that we want students
to be registered for subjects,
and therefore student_test
to be constrained to subjects
that the student
is registered for.
I have updated the Student Test Data Model (Page 3). Now we have even more integrity, such that student_test
is constrained to subjects
that the student
is registered for. The relevant Predicates are:
each student registered for 0-to-n student_subjects
each student_subject is a registration of 1 student
each subject attracted 0-to-n student_subjects
each student_subject is an attraction of 1 subject
each subject_test was taken by 0-to-n student_tests
each student_test is [a taking of] 1 subject_test
each student_subject took 0-to-n student_tests
each student_test is taken by 1 student_subjects
Now the data model appears to be complete.
Context is everything in a database.
The data hierarchies are plainly visible in the compounding of the Keys.
Notice that it is the Relational Keys, in the child tables, that provide Relational Integrity with the parent tables, to every higher level (parent, grandparent) in the hierarchy.
In case it is not obvious, notice the power of Relational Joins. Something you cannot do with Record Filing Systems that have ID
fields in every File. Eg:
- Join `student_response` directly to `subject` on `subject_code`, without having to navigate the two levels in-between
- Join `student_response` directly to `student` on `student_id`, without having to navigate the two levels in-between
No, there is no better design, because the design has nothing to do with how many records will be in the tables. You will choose the same design, no matter whether you deal with ten students or ten thousand.
Your table design looks good. Don't worry about the number of records. A dbms is made to deal with large tables. And 100k records is still a small database. I wouldn't even change this design if there were billions of answers to store.
If you want to normalize the data, then I'd create the tables a little differently.
Your Student table looks fine. Generally, I use a singular name for tables, rather than plural.
Student
-------
Student ID
Name
Email
...
Here's the Test table:
Test
----
Test ID
Test Name
...
We tie students to tests with a junction table.
StudentTest
-----------
Student ID
Test ID
Test Started Timestamp
Test Duration
...
The time and length of the test vary from student to student, so those columns are included on the StudentTest table.
The Question table.
Question
--------
Question ID
Question Text
And the Answer table.
Answer
------
Answer ID
Answer Text
Now here's where things get tricky. You could assign questions to a test based on the ID, like this.
TestQuestion
------------
Test ID
Question ID
But if you do that, and someone changes the Question text after the test, then the Question ID is pointing to a different question than the question on the test.
To solve this problem, we create history tables like this:
QuestionHistory
---------------
QuestionHistory ID
Question Text
AnswerHistory
-------------
AnswerHistory ID
Answer Text
So, we create the TestQuestion table like this:
TestQuestion
------------
Test ID
QuestionHistory ID
And copy the questions as well as the answers to the history tables.
For similar reasons, we create the QuestionAnswer table like this:
QuestionAnswer
--------------
QuestionHistory ID
AnswerHistory ID
Is Correct Answer
Your code could make sure that each question has 4 possible answers. The database allows for more or less than 4 possible answers.
Finally, we tie the student's answers to the test questions.
StudentQuestionAnswer
---------------------
Student ID
Test ID
QuestionHistory ID
AnswerHistory ID
Is Correct Answer
Yes, the Test ID column is duplicated here. This is so you can query by the test as well as the student that took the test.
The Is Correct Answer field has a different meaning in the QuestionAnswer table and the StudentQuestionAnswer table. In the QuestionAnswer table, the Is Correct Answer boolean points to the correct answer. In the StudentQuestionAnswer table, the Is Correct Answer boolean signifies that the student answered the question correctly.
This should be a complete question / answer database. You could tie tests to courses if you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With