Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting records from Oracle with multiple decimal points (.)

UPDATE:

ORACLE VERSION 10G

I have a list of records in Oracle as follows, these are actually sections of various books
The records are generated in the below format

[main topic].[sub topic].[first level section] ..... .[last level section]

Sections
--------
1
7.1
6.2 
7.1
7.4
6.8.3
6.8.2
10
1.1
7.6
6.1
11
8.3
8.5
1.1.2
6.4
6.6
8.4
1.1.6
6.8.1
7.7.1
7.5
7.3

I want to order this like as follows

 1
 1.1
 1.1.2
 1.1.6
 6.2    
 6.4    
 6.5    
 6.6    
 6.7    
 6.8.1    
 6.8.2    
 6.8.3    
 7.2    
 7.3    
 7.4    
 7.5    
 7.6    
 7.7.1    
 7.7.2    
 8.3    
 8.4    
 8.5
 10

But as the field is not a numeric datatype the sorting results in something like this

1
10
1.1
1.1.2
1.1.6
....
.....
8.5

How can I sort them. I am unable to convert them to number due to multiple number of decimal points.

Is there any function in oracle that supports such a sorting technique

like image 758
Sangeet Menon Avatar asked Jan 09 '14 12:01

Sangeet Menon


3 Answers

When the maximum depth is known, you can split the section in sub-sections:

SQL> SELECT SECTION FROM DATA
  2   ORDER BY to_number(regexp_substr(SECTION, '[^.]+', 1, 1)) NULLS FIRST,
  3            to_number(regexp_substr(SECTION, '[^.]+', 1, 2)) NULLS FIRST,
  4            to_number(regexp_substr(SECTION, '[^.]+', 1, 3)) NULLS FIRST;

SECTION
-------
1
1.1
1.1.2
1.1.6
6.1
6.2
[...]
8.5
10
11

If the maximum depth of sub-sections is unknown (but presumably less than a couple hundred on 8-bit character databases or less than a few thousands in ANSI-character databases), you could define a function that converts your unsortable digits into sortable characters:

SQL> CREATE OR REPLACE FUNCTION order_section (p_section VARCHAR2)
  2     RETURN VARCHAR2 IS
  3     l_result VARCHAR2(4000);
  4  BEGIN
  5     FOR i IN 1..regexp_count(p_section, '[^.]+') LOOP
  6        l_result := l_result
  7                    || CASE WHEN i > 1 THEN '.' END
  8                    || CHR(64+regexp_substr(p_section, '[^.]+', 1, i));
  9     END LOOP;
 10     RETURN l_result;
 11  END;
 12  /

Function created

SQL> SELECT SECTION, order_section(SECTION)
  2    FROM DATA
  3   ORDER BY 2;

SECTION ORDER_SECTION(SECTION)
------- -------------------------
1       A
1.1     A.A
1.1.2   A.A.B
1.1.6   A.A.F
6.1     F.A
6.2     F.B
[...]
8.5     H.E
10      J
11      K
like image 61
Vincent Malgrat Avatar answered Nov 15 '22 20:11

Vincent Malgrat


Solution without regexp and functions (suppose t is a table with source data):

select * from t
order by
    (
      select 
        sum(
          to_number(substr(
                   sections,
                   decode(level,
                     1,1,
                     instr(sections, '.', 1, level-1)+1
                   ),
                   decode(instr(sections, '.', 1, level),
                     0, length(sections),
                     instr(sections, '.', 1, level) 
                     - 
                     decode(level,
                       1,1,
                       instr(sections, '.', 1, level-1)+1
                     )
                   )  
          )) 
          * power(1000, 10-level)
        )
      from dual
        connect by instr(sections,'.',1,level-1) > 0
    ) 

SQLFiddle example

Main idea is to calculate number, wich indicates priority of each row. Suppose, we have 33.17.21.2 value. This string may be treated as a number in hypotetical numeral system with base Q like a hexadecimal numbers represents IPv4 address, and then converted to a numeric representation:
33*(Q^3) + 17*(Q^2) + 21*(Q^1) + 2*(Q^0)

For example, if Q=100 then number from exmple is

33*1000000 + 17*10000 + 21*100 + 2*1 = 33172102

First trouble with this approach is that each level numbers required to be less than choosed Q value. It's by design and can't be eleminated.

Next is that we don't know how many levels at all, we have 7.1 and 2.2.2.2.2.2, and shorter one most come first. Therefore while calculating value it starts from some fixed power N and then degrades power of Q, so in case of Q=100 and N=3 sequence of multipilers starts with this numbers: 1000000, 10000, 100, 1, 1/100, 1/10000, 1/1000000, ...

In code above Q=1000 and N=10, but this may be changed depending on required parameters. Number of levels limited by choosed Q value and precision of Oracle number type. Theoretically it's possible to build expression for longer strings by splitting string into parts.

Rest of the code is just hierarchical query for splitting string to sequence of numbers.

Update

Same approach may be used easily just for strings: '20' comes before '8' because information about second digit are missing. If we pad both values to some fixed length it ordered as expected: '008' < '020', so it's possible to deal with strings only:

select * from t order by 
  (
    select
      listagg(
        lpad(
          substr(
            sections,
            decode( level,
              1,1,
              instr(sections, '.', 1, level-1)+1
            ),
            decode(instr(sections, '.', 1, level),
              0, length(sections),
              instr(sections, '.', 1, level)
              -
              decode(level,
                1, 1,
                instr(sections, '.', 1, level-1)+1
              )
            )
          ),
          8,'0'
        ),
        '-'
      ) within group (order by level)
    from dual
    connect by instr(sections,'.',1,level-1) > 0
  )

With string length limitation of 4000 chars and 9 digits on each level with single separation symbol ('-' in example above) it's possible to handle 400 levels of hierarchy.

Main disadvantage of this method is a memory consumption and comparison speed. From other side, lack of a conversion to a number makes it compatible even with mixed chapter numbering( things like '13.3.a.vii' or 'III.A.13.2' (Ooops ... roman numerals handled improperly)

In case of decimal-number-only numbering variant with strings may be compacted by translation of numbers to hexadecimal representation. With 4 hex symbols it's possible to handle 16535 numbers on each level, and with 8 symbols - full 32-bit number which more than enough for most applications.

select * from t order by 
  (
    select
      listagg(
        lpad(
          trim(to_char(
            to_number(substr(
              sections,
              decode( level,
                1,1,
                instr(sections, '.', 1, level-1)+1
              ),
              decode(instr(sections, '.', 1, level),
                0, length(sections),
                instr(sections, '.', 1, level)
                -
                decode(level,
                  1, 1,
                  instr(sections, '.', 1, level-1)+1
                )
              )
            )),
            'XXXXXXXX'
          )),
          4,'0'
        ),
        '-'
      ) within group (order by level)
    from dual
    connect by instr(sections,'.',1,level-1) > 0
  ) 

P.S. Of course, it's possible to use all expressions above in select list to examine calculated values instead of using it in order by.

like image 37
ThinkJet Avatar answered Nov 15 '22 19:11

ThinkJet


In case the number of level is fix (e.g. max. 4) you can use this one:

ORDER BY 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 1)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 2)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 3)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 4)) NULLS FIRST
like image 43
Wernfried Domscheit Avatar answered Nov 15 '22 21:11

Wernfried Domscheit