I have the following data of sales for various categories of items:
category year salesVolume
1 2002 45
1 2003 47
2 2002 789
2 2003 908
3 2002 333
3 2003 123
41 2002 111
41 2003 90
Now I want to compare sales volume in the year 2002 to the year 2003, category wise, and write results as:
category salesIncreasing?
1 TRUE
2 TRUE
3 FALSE
41 FALSE
Is it possible to do it in SQL. If so please let me know. Actually I am using Impala SQL. Thanks.
Comparing rows of the same table. In the example, we are comparing the immediate rows to calculate the sales made on a day by comparing the amounts of two consecutive days. Comparison of columns in the same table is possible with the help of joins. Here we are comparing all the customers that are in the same city using the self join in SQL.
You can use it to compare consecutive rows or get difference between two rows. Let’s say you have the following table sales (id, order_date, amount) and amount column contains total cumulative sales data. In our example, if you want to find out sales made on each day then you need to compare each row’s amount value with that of previous row.
To calculate any difference, you need two elements; to calculate a difference in SQL, you need two records. You can calculate the difference between two columns in the same record, as I’ll show in a moment.
If you’re using ROW_NUMBER () in a query just to apply filters or joins later on in your transformation, there may be a better way. Rather than co-opting ROW_NUMBER () to compare neighboring rows, a better solution is to apply a different window function that was designed to solve this problem: LAG ().
SELECT
a.category,
CASE WHEN a.salesVolumes < b.salesVolumes THEN 'TRUE' ELSE 'FALSE' END AS salesIncreasing
FROM MyTable a
INNER JOIN MyTable b ON a.category = b.category
WHERE a.year = 2002
AND b.year = 2003
The idea is to have a single table as a result that let you compare and project the sales into a new data. In order to do this, you join the table with itself, and you use two restrictions in the WHERE clause.
You can do this with conditional aggregation as well as using a join:
select fd.product,
sum(case when year = 2002 then SalesVolume end) as sales_2002,
sum(case when year = 2003 then SalesVolume end) as sales_2003,
(case when sum(case when year = 2002 then SalesVolume end) is null
then 'New2003'
when sum(case when year = 2003 then SalesVolume end) is null
then 'No2003'
when sum(case when year = 2002 then SalesVolume end) > sum(case when year = 2003 then SalesVolume end)
then 'Decreasing'
when sum(case when year = 2002 then SalesVolume end) = sum(case when year = 2003 then SalesVolume end)
then 'Equal'
else 'Increasing'
end) as Direction
from followingdata fd
where year in (2002, 2003)
group by fd.product;
The advantage of this approach over a join
is that it handles all products, even those that do not appear in both years.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With