Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL group by date, force to return null value if does not exists

Tags:

mysql

I built a query, to let me return average ranking of some Apps.

But for some of them, for some month, we don't have data, cause the app is a new one (let's say, the app is out since this month, and so we are collecting data since this month)

SELECT 
    DATE_FORMAT(date, '%Y-%m'),
    app_id,
    AVG(rank)
FROM wadstats.applestore_ranking
where app_id IN (100,
        2,
        3,
        4,
        5,
        6)
GROUP BY MONTH(date), app_id
ORDER BY CASE WHEN app_id = 100 THEN 1 ELSE 2 END, date ASC

I was in need to show first the app_id = 100

But for app_id = 8, I don't have data for august for exemple.

Then results looks like

'2015-07', '100', '3.9355'
'2015-04', '100', '49.5000'
'2015-08', '100', '5.2258'
'2015-05', '100', '16.3333'
'2015-09', '100', '6.1333'
'2015-06', '100', '7.5667'
'2015-10', '100', '5.7727'
'2015-04', '2', '6.0000'
'2015-08', '2', '9.8710'
'2015-05', '2', '6.4667'
'2015-09', '2', '8.9667'
'2015-06', '2', '8.5333'
'2015-10', '2', '9.9545'
'2015-07', '2', '10.5806'
'2015-05', '3', '56.3929'
'2015-09', '3', '55.1667'
'2015-06', '3', '35.2500'
'2015-07', '3', '38.7143'
'2015-04', '3', '38.7500'
'2015-08', '3', '52.5500'
'2015-09', '4', '30.2105'
'2015-06', '4', '27.9231'
'2015-10', '4', '30.0000'
'2015-07', '4', '47.0000'
'2015-08', '4', '32.6818'
'2015-06', '5', '46.8667'
'2015-10', '5', '86.6667'
'2015-07', '5', '63.5185'
'2015-04', '5', '24.2500'
'2015-08', '5', '67.3571'
'2015-10', '6', '30.1818'

i wish instead to have null for each month even if no data was available for this specific month

Expected results

'2015-07', '100', '3.9355'
'2015-04', '100', '49.5000'
'2015-08', '100', '5.2258'
'2015-05', '100', '16.3333'
'2015-09', '100', '6.1333'
'2015-06', '100', '7.5667'
'2015-10', '100', '5.7727'
'2015-04', '2', '6.0000'
'2015-08', '2', '9.8710'
'2015-05', '2', '6.4667'
'2015-09', '2', '8.9667'
'2015-06', '2', '8.5333'
'2015-10', '2', '9.9545'
'2015-07', '2', '10.5806'
'2015-05', '3', '56.3929'
'2015-09', '3', '55.1667'
'2015-06', '3', '35.2500'
'2015-07', '3', '38.7143'
'2015-04', '3', '38.7500'
'2015-08', '3', '52.5500'
'2015-09', '4', '30.2105'
'2015-06', '4', '27.9231'
'2015-05', '4', NULL
'2015-10', '4', '30.0000'
'2015-07', '4', '47.0000'
'2015-08', '4', '32.6818'
'2015-06', '5', '46.8667'
'2015-10', '5', '86.6667'
'2015-07', '5', '63.5185'
'2015-04', '5', '24.2500'
'2015-08', '5', '67.3571'
'2015-04', '6', NULL
'2015-05', '6', NULL
'2015-06', '6', NULL
'2015-07', '6', NULL
'2015-08', '6', NULL
'2015-09', '6', NULL
'2015-10', '6', '30.1818'

If i need to have 0 instead of NULL, that would be ok too, but I need to have for each month in the DB, to have value for each app_id

Thanks so much by advance

like image 739
Toucouleur Avatar asked Oct 22 '15 13:10

Toucouleur


2 Answers

The query below uses a derived table (alias inr) to put together YearMonth/app_id combinations. It then uses that in a left join to fetch data whether or not it exists in table applestore_ranking.

Use ifnull() if you want a zero to appear as opposed to a NULL. So that part would become ifnull(AVG(r.rank),0) as Rank, instead.

Note, it would help one to simply put this staging together then highlight the code for the inr select alone and see its simple output. This would make the left join, which then follows, easier to understand.

The concept of the Helper table is used all the time in sql. Sometimes they are put together on the fly, and dropped. Other times they are permanent.

Schema

create schema appleSandbox;
use appleSandbox;

-- drop table applestore_ranking;
create table applestore_ranking
(   id int auto_increment primary key,
    app_id int not null,
    date date not null, -- not a great column name
    rank int not null
);
-- truncate table applestore_ranking;
insert applestore_ranking (app_id,date,rank) values
(2,'2015-08-01',1),(2,'2015-09-05',10),(2,'2015-09-12',11),(2,'2015-10-01',14),
(6,'2015-10-01',7),(6,'2015-10-05',6),(6,'2015-10-14',2),
(100,'2015-09-01',16),(100,'2015-10-01',16),(100,'2015-10-05',17),(100,'2015-10-14',18);

create table monthHelper
(   -- load this up with a few years worth
    id int auto_increment primary key,
    theDate date not null, -- slightly better column name   
    wantToSee int not null -- do we want to see it in results or not? 0=no, 1=yes
);

-- note only a few wantToSee have been turned on to 1
insert monthHelper(theDate,wantToSee) values
('2015-05-01',0),('2015-06-01',0),('2015-07-01',0),('2015-08-01',1),('2015-09-01',1),('2015-10-01',1),('2015-11-01',0),('2015-12-01',0),
('2016-01-01',0),('2016-02-01',0),('2016-03-01',0); -- etc

The Query

SELECT DATE_FORMAT(inr.theDate, '%Y-%m') as YearMonth, inr.app_id, AVG(r.rank) as Rank
FROM
(   select distinct mh.theDate,r.app_id
    from monthhelper mh
    cross join applestore_ranking r
    where mh.wantToSee=1
    and r.app_id IN (100,2,3,4,5,6)
) inr
left join applestore_ranking r
on r.app_id=inr.app_id and year(inr.theDate)=year(r.date) and month(inr.theDate)=month(r.date)
GROUP BY MONTH(inr.theDate), inr.app_id
ORDER BY CASE WHEN inr.app_id = 100 THEN 1 ELSE 2 END, inr.theDate ASC

The Results

+-----------+--------+---------+
| YearMonth | app_id | Rank    |
+-----------+--------+---------+
| 2015-08   |    100 |    NULL |
| 2015-09   |    100 | 16.0000 |
| 2015-10   |    100 | 17.0000 |
| 2015-08   |      2 |  1.0000 |
| 2015-08   |      6 |    NULL |
| 2015-09   |      2 | 10.5000 |
| 2015-09   |      6 |    NULL |
| 2015-10   |      2 | 14.0000 |
| 2015-10   |      6 |  5.0000 |
+-----------+--------+---------+

Cleanup

drop schema AppleSandbox;
like image 114
Drew Avatar answered Nov 15 '22 05:11

Drew


thanks to Drew,

Here is a way to perform in a single query, what explained us in detail Drew.

Thanks a lot for your tip using combination of CROSS JOIN

   SELECT 
    inr.date AS 'month',
    inr.app_id,
    r.title,
    AVG(r.rank) as 'ranking'    
FROM 
    (
        SELECT
            DISTINCT DATE_FORMAT(mh.date, '%Y-%m') date,
            r.app_id
        FROM 
            applestore_ranking mh
        CROSS JOIN
            applestore_ranking r
        WHERE 
            mh.app_id = 100
        AND
            r.app_id IN (100, 1, 2, 3, 4, 5)
    ) inr
LEFT JOIN
    applestore_ranking r ON r.app_id = inr.app_id
AND inr.date = DATE_FORMAT(r.date, '%Y-%m') 
GROUP BY
    inr.date,
    inr.app_id
ORDER BY
    CASE WHEN inr.app_id = 100 THEN 1 ELSE 2 END
like image 31
Toucouleur Avatar answered Nov 15 '22 03:11

Toucouleur