Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORDER BY in SQL where column is synthetic string with embedded integer

Tags:

sql

sql-server

h2

I have the following table :

CREATE TABLE TEST 
(
     name VARCHAR(10), 
     date_of_entry DATE, 
     flag1 INT, 
     flag2 INT,
     salary FLOAT, 
     flag3 INT,
     id INT
);

with the following rows :

name    date_of_entry   flag1   flag2   salary      flag3   id
--------------------------------------------------------------
AGMA    2018-11-08      0       1       265466940   1       1
AGMA    2018-11-08      0       1       220737125   1       2
AGMA    2018-11-08      0       1       181270493   0       3
AGMA    2018-11-08      0       1       8584205     0       4

I would like to execute the following SQL to order the rows in a specific manner :

SELECT 
    name
    + '.' + CONVERT(varchar(8), date_of_entry, 112) 
    + '.' + CONVERT(varchar(1), flag1)
    + '.' + CONVERT(varchar(1), flag2)
    + '.' + CONVERT(varchar(2555), salary)
    + '.' + CONVERT(varchar(1), flag3)
    + '.' + CONVERT(varchar(1), id) AS SYNTHETIC_ORDER
FROM 
    TEST
ORDER BY 
    SYNTHETIC_ORDER DESC

However, the salary column gets sorted incorrectly within the string. So my end result is (when executed within Microsoft SQL Server) :

SYNTHETIC_ORDER
-----------------------------------
AGMA.20181108.0.1.8.58421e+006.0.4
AGMA.20181108.0.1.2.65467e+008.1.1
AGMA.20181108.0.1.2.20737e+008.1.2
AGMA.20181108.0.1.1.8127e+008.0.3

As can be noted, the result is that id 4 comes first, when I want id 1 to come first.

Expected result :

SYNTHETIC_ORDER
-----------------------------------
AGMA.20181108.0.1.2.65467e+008.1.1
AGMA.20181108.0.1.2.20737e+008.1.2
AGMA.20181108.0.1.1.8127e+008.0.3
AGMA.20181108.0.1.8.58421e+006.0.4

Is there a way to ensure that the salary is correctly ordered in this SQL?

like image 509
Krish Srinivasan Avatar asked Jun 05 '26 00:06

Krish Srinivasan


2 Answers

Why can't you just order it by the individual columns?

SELECT 
    date_of_entry, flag1, flag2, salary, flag3 
    , name
        + '.' + CONVERT(varchar(8), date_of_entry, 112) 
        + '.' + CONVERT(varchar(1), flag1)
        + '.' + CONVERT(varchar(1), flag2)
        + '.' + CONVERT(varchar(2555), salary)
        + '.' + CONVERT(varchar(1), flag3)
        + '.' + CONVERT(varchar(1), id) AS SYNTHETIC_ORDER
FROM  TEST
ORDER BY date_of_entry DESC, flag1 DESC, flag2 DESC, salary DESC, flag3 DESC

This will get you the MAX.

SELECT SYNTHETIC_ORDER
FROM (
    SELECT 
        ROW_NUMBER() OVER(ORDER BY date_of_entry DESC, flag1 DESC, flag2 DESC, salary DESC, flag3 DESC) AS RowNum
        , name
            + '.' + CONVERT(varchar(8), date_of_entry, 112) 
            + '.' + CONVERT(varchar(1), flag1)
            + '.' + CONVERT(varchar(1), flag2)
            + '.' + CONVERT(varchar(2555), salary)
            + '.' + CONVERT(varchar(1), flag3)
            + '.' + CONVERT(varchar(1), id) AS SYNTHETIC_ORDER
    FROM  TEST
) a
WHERE RowNum = 1
like image 66
Eric Avatar answered Jun 07 '26 22:06

Eric


Fixed width rep and it uses only functions available in both H2 (not tagged) and SQLS (tagged):

SELECT 
    CONCAT(
      CAST(name as CHAR(10)), --right pad to 10, 

      YEAR(date_of_entry),
      RIGHT(CONCAT('0',MONTH(date_of_entry)),2),
      RIGHT(CONCAT('0',DAY(date_of_entry)),2), --yyyymmdd

      CAST(flag1 as CHAR(1)), --rpad to 1, doesn't need cast if never null/0 length

      CAST(flag2 as CHAR(1)), --maybe doesn't need cast, see above

      RIGHT(CONCAT('0000000000', CAST(salary AS INT)),10), --lpad with 0 to 10 wide

      CAST(flag3 as CHAR(1)), --maybe doesn't need cast, see above

      RIGHT(CONCAT('0000000000', id), 10) --lpad with 0 to 10 wide

    ) AS SYNTHETIC_ORDER
FROM 
    TEST
ORDER BY 
    SYNTHETIC_ORDER DESC

Points of note:

  • Your CREATE TABLE statement doesn't mention ID, but your query does; included ID

  • Your query doesn't mention NAME but your example data output does; included NAME

  • You might not need to pad the ID or salary so much

  • Come of the casts to chars (e.g. on flag columns) can be dropped (if the flag column is 100% guaranteed to always be 1 char long)

  • If salary max value in table is larger than an int can hold, consider a cast to something else

  • By padding the salary with leading zeroes, the sort will work out. Normalising it to between 0 and 1 could also work, if all the values were padded out to the same width but you possibly then get the problem that loss of precision (dividing a 10 digit salary down to eg 0.123456) will cause two different salaries to merge because there aren't enough digits to fully represent. With any division-that-quantizes-to-lower precision you then risk the original values sorting wrongly (e.g. If salaries of 1000000000 and 1000000001 with id of 2 and 1 respectively both normalise to 0.123456 they would end up sorted wrongly. To guard against this you probably need as many digits for the division answer as the salary had in the first place, padded to a fixed width, but if you've gone that far you might as well just pad all the salaries out either to the width of the widest or to some width that will contain them all. Here utilising a cast to an int might be handy, if the int will overflow. You can make a decision to pad to one digit wider than an int will hold and then if someone inserts a large value in future and your query starts failing because of overflow it at least won't silently deliver wrong results because the pad is chopping digits off the left hand edge. In addressing the cast to bit you can choose whether to add some logic that pads out to the LENGTH() of the string form of the SELECT MAX salary

CONCAT is nice cos you can pass most types to it without first casting to varchar, and it doesn't null the whole thing if you concat a null on, unlike regular string concat ops with + or ||

like image 20
Caius Jard Avatar answered Jun 07 '26 22:06

Caius Jard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!