Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL - Getting statistical data

I need to collect some statistical information in my application. I have a table of users (tb_user) Every time a new user accesses the application, it adds a new record in this table, ie, one line for each user. The main field are id and date_hour (timestamp for the first time user accessed the application).

tb_user

id (bigint) | date_time (timestamp with time zone)
 1          |  2012-01-29 11:29:50.359-03
 2          |  2012-01-31 14:27:10.359-03

I need get:

amount average users by day, week and month

Example:

by day: 55.45

by week : XX.XX

month: XX.XX

EDIT:

My best solution was:

WITH daily_count AS (SELECT COUNT(id) AS user_count FROM tb_user)
SELECT user_count, tbaux2.days, (user_count/tbaux2.days) FROM daily_count, 
    (SELECT EXTRACT(DAY FROM (t2.diff) ) + 1 AS days
     FROM
       (with tbaux AS(SELECT  min(date_time) AS min FROM tb_user)
       SELECT (now() - min) AS diff
       FROM tbaux) AS t2) AS tbaux2
GROUP BY user_count, tbaux2.days

But this solution only worked with EXTRACT (DAY ... With weeks and month did not work

Any help is welcome.

Alternatively:

SELECT user_count, tbaux2.days, (user_count/tbaux2.days) AS userPerDay, ((user_count/tbaux2.days) * 7) AS userPerWeek, ((user_count/tbaux2.days) * 30) AS userPerMonth

EDIT 2:

Based on responses from @Bruno, there are some considerations:

When I asked the question, in really I requested a way to select data by day, month and year. I believe that the search that I posted and @Bruno refined, should be interpreted as average of "a day, every 7 days and every 30 days" and not by days, weeks and months. I believe that if it is interpreted in this way, there not will be problems of gender-quoted in example (10% drop). I believe this approach of "every" is answer I need in moment, so will sign this answer.

I suggest as an improvement of post:

  • Consider only closed day in result (not collect users of the current day, and not counting the current day in division)
  • The result is two numeric digits.
  • New research considering a data really per week and per month.

Thanks.

like image 454
vctlzac Avatar asked Feb 07 '12 12:02

vctlzac


1 Answers

You should look into aggregate functions (min, max, count, avg), which go hand in hand with GROUP BY. For date-based aggregations, date_trunc is also useful.

For example, this will return the number of rows per day:

SELECT date_trunc('day', date_time) AS day_start,
       COUNT(id) AS user_count FROM tb_user
    GROUP BY date_trunc('day', date_time);

You can then do the daily average using something like this (with a CTE):

WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
       COUNT(id) AS user_count FROM tb_user
    GROUP BY date_trunc('day', date_time))
SELECT AVG(user_count) FROM daily_count;

Use 'week' instead of day for the weekly counts, and so on (see date_trunc documentation).

EDIT: (Following comment: average up to and including 5/1/2012, i.e. before the 6th.)

WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
       COUNT(id) AS user_count
    FROM tb_user
       WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06') 
    GROUP BY date_trunc('day', date_time))
SELECT SUM(user_count)/(DATE('2012-01-06') - DATE('2012-01-01')) FROM daily_count;

What's above is over-complicated, in this case. This should give you the same result:

SELECT COUNT(id)/(DATE('2012-01-06') - DATE('2012-01-01'))
    FROM tb_user
       WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06');

EDIT 2: After your edit, I guess what you're after is just a single global average for the entire period of existence of your database, rather than groups by month/week/day.

This should give you the average number of rows per day:

WITH total_min_max AS (SELECT
        COUNT(id) AS total_visits,
        MIN(date_time) AS first_date_time,
        MAX(date_time) AS last_date_time,
    FROM tb_user)
SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
    FROM total_min_max

(I would replace last_date_time with NOW() to make the average over the time until now, rather than until the last visit, if there's no recent visit.)

Then, for daily, weekly, and "monthly":

WITH daily_avg AS (
    WITH total_min_max AS (SELECT
            COUNT(id) AS total_visits,
            MIN(date_time) AS first_date_time,
            MAX(date_time) AS last_date_time,
        FROM tb_user)
    SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
        FROM total_min_max)
SELECT
         users_per_day,
         (users_per_day * 7) AS users_per_week,
         (users_per_month * 30) AS users_per_month
    FROM daily_avg

This being said, conclusions you draw from such statistics might not be great, especially if you want to see how it changes.

I would also normalise the data per day rather than assuming 30 days in a month (if not per hour, because not all days have 24 hours). Say you have 10 visits per day in Jan 2011 and 10 visits per day in Feb 2011. That gives you 310 visits in Jan and 280 visits in Feb. If you don't pay attention, you could think you've had a almost a 10% drop in terms of number of visitors, so something went wrong in Feb, when really, this isn't the case.

like image 159
Bruno Avatar answered Nov 07 '22 12:11

Bruno