Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected result with ORDER BY

I have the following query:

SELECT    
    D.[Year] AS [Year]
    , D.[Month] AS [Month]
    , CASE
        WHEN f.Dept IN ('XSD') THEN 'Marketing'
        ELSE f.Dept 
    END AS DeptS 
    , COUNT(DISTINCT f.OrderNo) AS CountOrders
FROM Sales.LocalOrders AS l WITH 
INNER JOIN Sales.FiscalOrders AS f 
    ON l.ORDER_NUMBER = f.OrderNo
INNER JOIN Dimensions.Date_Dim AS D 
    ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE)
WHERE YEAR(f.OrderDate) = 2019
AND f.Dept IN ('XSD', 'PPM', 'XPP')
GROUP BY 
    D.[Year]
    , D.[Month]
    , f.Dept 
ORDER BY 
    D.[Year] ASC
    , D.[Month] ASC

I get the following result the ORDER BY isn't giving the right result with Month column as we can see it is not ordered:

Year    Month   Depts       CountOrders
2019    1       XSD         200
2019    10      PPM         290
2019    10      XPP         150
2019    2       XSD         200
2019    3       XPP         300

The expected output:

Year    Month   Depts       CountOrders
2019    1       XSD         200
2019    2       XSD         200
2019    3       XPP         300
2019    10      PPM         290
2019    10      XPP         150
like image 514
Martin James Avatar asked Dec 18 '22 11:12

Martin James


1 Answers

Your query

It is ordered by month, as your D.[Month] is treated like a text string in the ORDER BY clause.

You could do one of two things to fix this:

  • Use a two-digit month number (e.g. 01... 12)
  • Use a data type for the ORDER BY clause that will be recognized as representing a month

A quick fix

You can correct this in your code by quickly changing the ORDER BY clause to analyze those columns as though they are numbers, which is done by converting ("casting") them to an integer data type like this:

ORDER BY 
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC

This will correct your unexpected query results, but does not address the root cause, which is your underlying data (more on that below).

Your underlying data

The root cause of your issue is how your underlying data is stored and/or surfaced.

Your Month seems to be appearing as a default data type (VarChar), rather than something more specifically suited to a month or date.

If you administer or have access to or control over the database, it is a good idea to consider correcting this.

In considering this, be mindful of potential context and change management issues, including:

  • Is this underlying data, or just a representation of upstream data that is elsewhere? (e.g. something that is refreshed periodically using a process that you do not control, or a view that is redefined periodically)
  • What other queries or processes rely on how this data is currently stored or surfaced (including data types), that may break if you mess with it?
  • Might there be validation issues if correcting it? (such as from the way zero, null, non-numeric or non-date data is stored, even if invalid)
  • What change management practices should be followed in your environment?
  • Is the data source under high transactional load?
  • Is it a production dataset?
  • Are other reporting processes dependent on it?

None of these issues are a good excuse to leave something set up incorrectly forever, which will likely compound the issue and introduce others. However, that is only part of the story.

The appropriate approach (correct it, or leave it) will depend on your situation. In a perfect textbook world, you'd correct it. In your world, you will have to decide.

A better way?

The above solution is a bit of a quick and nasty way to force your query to work.

The fact that the solution CASTs late in the query syntax, after the results have been selected and filtered, hints that is not the most elegant way to achieve this.

Ideally you can convert data types as early as possible in the process:

  • If done in underlying data, not the query, this is the ultimate but may not suit the situation (see below)
  • If done in the query, try to do it earlier.

In your case, your GROUP BY and ORDER BY are both using columns that look to be redundant data from the original query results, that is, you are getting a DATE and a MONTH and a YEAR. Ideally you would just get a DATE and then use the MONTH or YEAR from that date. Your issue is your dates are not actually dates (see "underlying data" above), which:

  • In the case of DATE, is converted in your INNER JOIN line ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE) (likely to minimise issues with the join)
  • In the case of D.[year] and D.[month], are not converted (which is why we still need to convert them further down, in ORDER BY)

You could consider ignoring D.[month] and use the MONTH DATEPART computed from DATE, which would avoid the need to use CAST in the ORDER BY clause.

In your instance, this approach is a middle ground. The quick fix is included at the top of this answer, and the best fix is to correct the underlying data. This last section considers optimizing the quick fix, but does not correct the underlying issue. It is only mentioned for awareness and to avoid promoting the use of CAST in an ORDER BY clause as the most legitimate way of addressing your issue with good clean query syntax.

There are also potential performance tradeoffs between how many columns you select that you don't need (e.g. all of the ones in D?), whether to compute the month from the date or a seperate month column, whether to cast to date before filtering, etc. These are beyond the scope of this solution.

So:

  • The immediate solution: use the quick fix
  • The optimal solution: after it's working, consider the underlying data (in your situation)
like image 117
ed2 Avatar answered Jan 02 '23 22:01

ed2