What's the best way to deal with the 8 different SQL questions below.
I have placed below a database schema, how it is represented in my Rails models, and seven questions for data I need to get out of my database. Some questions I have answered, others I'm not sure of the best solution.
Question #7 is a curve ball, because it potentially changes the answers to all the other questions.
Students ------- ID Name Courses ----- ID Name Grade Enrollments ---------- ID Student_ID Course_ID
class Course < ActiveRecord::Base
has_many :enrollments
has_many :students, :through=>:enrollments
end
class Enrollment < ActiveRecord::Base
belongs_to :student
belongs_to :course
end
class Student < ActiveRecord::Base
has_many :enrollments
has_many :courses, :through => :enrollments
end
SELECT s.* FROM Students s
LEFT JOIN Enrollments e on e.student_id = s.id
LEFT JOIN Courses c on e.course_id = c.id
WHERE c.grade = 9 AND c.name = 'Math'
This one is simple. ActiveRecord handles this well
c = Course.where(:grade=>9).where(:name=>'Math').first
c.students
SELECT c.* FROM Courses c
LEFT JOIN Enrollments e on c.id = e.course_id
LEFT JOIN Students s on e.student_id = s.id
WHERE s.name = 'John'
Again, simple.
s = Student.where(:name=>'John').first
s.courses
SELECT c.*, count(e.student_id) FROM Courses C
LEFT JOIN Enrollments e on c.id = e.course_id
WHERE c.grade = 9 GROUP BY c.id
Counter Cache will work nicely here.
class AddCounters < ActiveRecord::Migration def up add_column :students, :courses_count, :integer, :default=>0 add_column :courses, :students_count, :integer, :default=>0 Student.reset_column_information Student.all.each do |s| Student.update_counters s.id, :courses_count => s.courses.length end Course.reset_column_information Course.all.each do |c| Course.update_counters c.id, :students_count => c.students.length end end def down remove_column :students, :courses_count remove_column :courses, :students_count end end
ActiveRecord
Course.where(:grade=>9).each do |c| puts "#{c.name} - #{c.students.size}" end
Not sure of the best solution. This would be VERY messy to do in SQL without keeping a counter cache for number of courses per grade level on each student. I could add a hook to update this information myself. I don't want to pull all students and courses and count them in post processing.
The following solution produces a lot of queries. Preloading the courses may not be possible. (For example, the students are coming from the association on a course)
students = some_course.students
matching_students = []
students.each do |s|
courses_9 = 0
courses_10 = 0
courses_11 = 0
s.courses.each do |c|
courses_9 += 1 if c.grade == 9
courses_10 += 1 if c.grade == 10
courses_11 += 1 if c.grade == 11
end
if courses_11 <= 3 && courses_10 > 1 && courses_9 == 0
matching_students << s
end
end
return matching_students
SELECT s.*, count(e.course_id) as num_Courses FROM Students s
INNER JOIN Enrollments e on s.id = e.student_id
INNER JOIN Courses c on e.course_id = c.id AND c.name = 'Math'
GROUP BY s.id HAVING num_Courses > 0
Or
SELECT DISTINCT s.* FROM Students s
INNER JOIN Enrollments e_math_1 on e_math_1.student_id = s.id
INNER JOIN Courses c_math_1 ON e_math_1.course_id = c_math_1.id AND c_math_1.name = 'Math'
INNER JOIN Enrollments e_math_2 on e_math_2.student_id = s.id
INNER JOIN Courses c_math_2 ON e_math_2.course_id = c_math_2.id AND c_math_2.name = 'Math'
WHERE c_math_1.id != c_math_2.id
Not sure of the best solution. The tricky part to this is that the ActiveRecord (or NoSQL) solution can't retrieve all students, and looking at their courses afterwards, because that would be too slow.
students = SomeObject.students
multiple_math_course_students = []
students.each do |s|
has_math_course = false
add_student = false
s.courses.each do |c|
if c.name == 'Math'
if has_math_course
add_student = true
else
has_math_course = true
end
end
end
multiple_math_course_students << s if add_student
end
SELECT s.* FROM Students s
INNER JOIN Enrollments e_math on e_math.student_id = s.id
INNER JOIN Courses c_math ON e_math.course_id = c_math.id
INNER JOIN Enrollments e_science on e_science.student_id = s.id
INNER JOIN Courses c_science on e_science.course_id = c_science.id WHERE c_math.name = 'Math' AND c_science.name = 'Science'
This involves joining to the same table (or in Rails, association) twice. Is there a way to do this smoothly with ActiveRecord's AREL wrapper? You could make a separate association for science classes and math classes, allowing you to do separate operations on each, but this won't work in the case of #7 below.
students = SomeObject.students
math_and_science_students = []
students.each do |s|
has_math_course = false
has_science_course = false
s.courses.each do |c|
has_math_course = true if c.name == 'Math'
has_science_course = true if c.name == 'Science'
end
math_and_science_students << s if has_math_course && has_science_course
end
It would not be acceptable to query the database for every student record. A page which displays 100 students would require 100 queries. At this point, I want to denormalize the database by putting a flag in the student table with "highest level course". Is this my best course of action? Would it be better to use a different data store other than a relational database from the start?
Imagine that the customer asked for any arbitrary data to be shown as a badge: Highest Grade Level, Number of Math Courses Taken, Gold Badge if taking Math, Science and History all together, etc. Should each of these cases be a call for denormalization of the database? Should denormalized data be kept in the same relational database as normalized data?
The goal of denormalization is to move data from normalized tables back into a single table to have the data where it is needed. For example, if a query joins multiple tables to get the data but indexing is not sufficient, denormalizing may be better.
Denormalization has a place with SQL and NoSQL databases, as well as in data warehousing.
Redundant data is eliminated when normalization is performed whereas denormalization increases the redundant data. Normalization increases the number of tables and joins.
Only if you need your database to perform better at particular tasks (such as reporting) should you opt for denormalization. If you do denormalize, be careful and make sure to document all changes you make to the database.
First, I think your database schema is fine. I would NOT de-normalize based upon these use cases, as they are very common.
Second, you have to learn to distinguish between Persistence, business logic and reports. ActiveRecord is good for basic persistence and encapsulating business logic. It handles the CRUD stuff and lets you put a lot of the logic of your application in the model. However, a lot of the logic you are talking about sounds like reports, especially #6. You are going to have to accept that for some kind of querying logic like this, raw SQL is going to be your best bet. I think the cache counters you have implemented might help you stay in active record and models if you are more comfortable there, but most likely you will have to drop to plain sql as you have done for several of these solutions. Reports in general require straight sql.
A normalized database is crucial to good application design. Its is really important for making your code clean for OLTP transaction and business logic. Don't denormalize just because you have to do some joins in sql. That is what sql is good at. All you are going to do by denormalizing is making some of your reporting logic faster and easier at the expensive of making your persistence and OLTP logic slower and harder.
So i would start out keeping your normalized database. If you need to join on a related table you can often use activerecord's include method to do this without resorting to regular sql. To do things like counts based on joins you'll have to use plain sql.
Eventually, if your database gets very large with lots of data, your reports will be slow because of all the joins you'll have to do. This is FINE. AT that point and no sooner, start considering making a separate reporting database that is denormalized that you can update hourly, nightly, weekly etc from the normalized database. Then move your reporting logic to query the reporting database without having to do joins. There is no need to start off this way however. You're just incurring extra complexity and expense without being certain of the payoff. Maybe your reporting sql with joins will work indefinitely without denormalization with the use of indexes. Don't prematurely optimize.
I don't think nosql is necessarily the answer either. From what little I know, NoSQL works well for specific uses cases. your application's uses cases and schema seem suited fine to relational databases.
Overall, I think the combination of raw sql (not arel/activerecord) and counters you have implemented are fine.
I'm running into the same issue at the moment. From my research there are a few ways to get around it.
First of all, I believe any application will run into these issues. The basic idea is that we model our data in a normalized fashion which inherently becomes slow and cumbersome when there's a lot of data and the data span across multiple tables.
The best approach I've been able to come up with is the following:
These two should give a lot of flexibility for the application and provide many convenience methods as well as solve most of the questions I'm trying to answer
Once I need to do a bunch of joins to get what I need and I feel like I should denormalize the tables to easily get to what I need, I consider the following:
SQL views: These are pre-defined sql statements, joins for example, to which I can link models to. Generally this is way faster than querying via ActiveRecord http://hashrocket.com/blog/posts/sql-views-and-activerecord
aggregate table: Create one or more aggregate tables and update asynchronously using delayed_job, resque for example. These aggregate can get updated once a day for example and the model can query it directly. Note that this is some sort of denormalized table.
Couchbase (NOSQL) I haven't used this one but it looks very interesting. http://couchbaseonrails.com/understand
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With