I have been searching all over the web and I have no clue.
Basically, there are two options :
1) Retrieve all rows from the database using Subscriber.all
and aggregate by day in the Rails app using the Enumerable.group_by
:
@subscribers = Subscriber.all
@subscriptions_per_day = @subscribers.group_by { |s| s.created_at.beginning_of_day }
I think this is a really bad idea. Retrieving all rows from the database can be acceptable for a small application, but it will not scale at all. Database aggregate and date functions to the rescue !
2) Run a SQL query in the database using aggregate and date functions :
Subscriber.select('STRFTIME("%Y-%m-%d", created_at) AS day, COUNT(*) AS subscriptions').group('day')
Which will run in this SQL query :
SELECT STRFTIME("%Y-%m-%d", created_at) AS day, COUNT(*) AS subscriptions
FROM subscribers
GROUP BY day
Much better. Now aggregates are done in the database which is optimized for this kind of task, and only one row per day is returned from the database to the Rails app.
... but wait... now the app has to go live in my production env which uses MySQL !
Replace STRFTIME()
with DATE_FORMAT()
.
What if tomorrow I switch to PostgreSQL ?
Replace DATE_FORMAT()
with DATE_TRUNC()
.
I like to develop with SQLite. Simple and easy. I also like the idea that Rails is database agnostic. But why Rails doesn't provide a way to translate SQL functions that do the exact same thing, but have different syntax in each RDBMS (this difference is really stupid, but hey, it's too late to complain about it) ?
I can't believe that I find so few answers on the Web for such a basic feature of a Rails app : count the subscriptions per day, month or year.
Tell me I'm missing something :)
It's been a few years since I posted this question. Experience has shown that I should use the same DB for dev and prod. So I now consider the database agnostic requirement irrelevant.
Dev/prod parity FTW.
I ended up writing my own gem. Check it out and feel free to contribute: https://github.com/lakim/sql_funk
It allows you to make calls like:
Subscriber.count_by("created_at", :group_by => "day")
You speak of some pretty difficult problems that Rails, unfortunately, completely overlooks. The ActiveRecord::Calculations docs are written like they're all you ever need, but databases can do much more advanced things. As Donal Fellows mentioned in his comment, the problem is much trickier than it seems.
I've developed a Rails application over the last two years that makes heavy use of aggregation, and I've tried a few different approaches to the problem. I unfortunately don't have the luxary of ignoring things like daylight savings because the statistics are "only trends". The calculations I generate are tested by my customers to exact specifications.
To expand upon the problem a bit, I think you'll find that your current solution of grouping by dates is inadequate. It seems like a natural option to use STRFTIME. The primary problem is that it doesn't let you group by arbitrary time periods. If you want to do aggregation by year, month, day, hour, and/or minute, STRFTIME will work fine. If not, you'll find yourself looking for another solution. Another huge problem is that of aggregation upon aggregation. Say, for example, you want to group by month, but you want to do it starting from the 15th of every month. How would you do it using STRFTIME? You'd have to group by each day, and then by month, but then someone account for the starting offset of the 15th day of each month. The final straw is that grouping by STRFTIME necessitates grouping by a string value, which you'll find very slow when performing aggregation upon aggregation.
The most performant and best designed solution I've come to is one based upon integer time periods. Here is an excerpt from one of my mysql queries:
SELECT
field1, field2, field3,
CEIL((UNIX_TIMESTAMP(CONVERT_TZ(date, '+0:00', @@session.time_zone)) + :begin_offset) / :time_interval) AS time_period
FROM
some_table
GROUP BY
time_period
In this case, :time_interval is the number of seconds in the grouping period (e.g. 86400 for daily) and :begin_offset is the number of seconds to offset the period start. The CONVERT_TZ() business accounts for the way mysql interprets dates. Mysql always assumes that the date field is in the mysql local time zone. But because I store times in UTC, I must convert it from UTC to the session time zone if I want the UNIX_TIMESTAMP() function to give me a correct response. The time period ends up being an integer that describes the number of time intervals since the start of unix time. This solution is much more flexible because it lets you group by arbitrary periods and doesn't require aggregation upon aggregation.
Now, to get to my real point. For a robust solution, I'd recommend that you consider not using Rails at all to generate these queries. The biggest issue is that the performance characteristics and subtleties of aggregation are different across the databases. You might find one design that works well in your development environment but not in production, or vice-versa. You'll jump through a lot of hoops to get Rails to play nicely with both databases in query construction.
Instead I'd recommend that you generate database-specific views in your chosen database and bring those along to the correct environment. Try to model the view as you would any other ActiveRecord table (id's and all), and of course make the fields in the view identical across databases. Because these statistics are read-only queries, you can use a model to back them and pretend like they're full-fledged tables. Just raise an exception if somebody tries to save, create, update, or destroy.
Not only will you get simplified model management by doing things the Rails way, you'll also find that you can write units tests for your aggregation features in ways you wouldn't dream of in pure SQL. And if you decide to switch databases, you'll have to rewrite those views, but your tests will tell you where you're wrong, and make life so much easier.
I just released a gem that allows you to do this easily with MySQL. https://github.com/ankane/groupdate
You should really try to run MySQL in development, too. Your development and production environments should be as close as possible - less of a chance for something to work on development and totally break production.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With