I've the following table (my_data):
year | X | Y -----+-----+----- 2010 | A | 10 2011 | A | 20 2011 | B | 99 2009 | C | 30 2010 | C | 40
what is the best / smallest SQL statement to retrieve only the data related to the highest year and grouped by 'X' , like this:
year | X | Y -----+-----+----- 2011 | A | 20 2011 | B | 99 2010 | C | 40
Note that this result table will be used in a join.
Partition By: This divides the rows or query result set into small partitions. Order By: This arranges the rows in ascending or descending order for the partition window. The default order is ascending. Row or Range: You can further limit the rows in a partition by specifying the start and endpoints.
A GROUP BY normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. PARTITION BY does not affect the number of rows returned, but it changes how a window function's result is calculated.
However, it's still slower than the GROUP BY. The IO for the PARTITION BY is now much less than for the GROUP BY, but the CPU for the PARTITION BY is still much higher. Even when there is lots of memory, PARTITION BY – and many analytical functions – are very CPU intensive.
So you can use the where clause with out any issue. This is not a issue with the Partition By clause, its deal of NULL.
select year, x,y from ( select year, x, y, max(year) over(partition by x) max_year from my data ) where year = max_year
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With