Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to PostgreSQL Distinct and Format

I have a 3.5 million rows in table acs_objects and I need to retrieve column creation_date with year only format and distinct.

My first attempt : 180~200 Sec (15 Rows Fetched)

SELECT DISTINCT to_char(creation_date,'YYYY') FROM acs_objects

My second attempt : 35~40 Sec (15 Rows Fetched)

SELECT DISTINCT to_char(creation_date,'YYYY')
FROM (SELECT DISTINCT creation_date FROM acs_objects) AS distinct_date

Is there any way to make it faster? -"I need to use this in ADP Website"

like image 763
Leonel Sarmiento Avatar asked May 15 '15 07:05

Leonel Sarmiento


2 Answers

I think you shouldn't select distinct from this huge table. Instead try to generate a short years sequence (say from 1900 to 2100) and select from this sequence only years that exist in acs_objects table. Result set will be the same but I think it will be faster. EXISTS subquery have to run fast on an indexed field creation_date.

SELECT y 
FROM
(
   select generate_series(1900,2100) as y
) as t
WHERE EXISTS (SELECT 1 FROM acs_objects 
                    WHERE creation_date >= DATE (''||t.y||'-01-01')     
                           AND  creation_date < DATE (''||t.y + 1||'-01-01'))

SQLFiddle demo

like image 138
valex Avatar answered Oct 23 '22 07:10

valex


In your second attempt you get distinct dates from the sub-query which you then all convert to a string representation and then you select the distinct ones. That is rather inefficient. Better is it to first extract the distinct years from the creation_date in a sub-query and simply cast those to text in the main query:

SELECT year::text
FROM (
  SELECT DISTINCT extract(year FROM creation_date) AS year FROM acs_objects
) AS distinct_years;

If you create an INDEX on the table, the query should run much faster still:

CREATE INDEX really_fast ON acs_objects((extract(year FROM creation_date)));

However, this may impact other uses of your table, in particular if you have many modifying statements (insert, update, delete). And this will only work if creation_date has a data type of date or timestamp (specifically not timestamp with timezone).

The below option looked promising because it does not use a sub-query, but it is in fact much slower (see comments below), probably because the DISTINCT clause is applied on a string:

SELECT DISTINCT extract(year FROM creation_date)::text
FROM acs_objects;
like image 7
Patrick Avatar answered Oct 23 '22 09:10

Patrick