UPDATE:
ORACLE VERSION 10G
I have a list of records in Oracle
as follows, these are actually sections of various books
The records are generated in the below format
[main topic].[sub topic].[first level section] ..... .[last level section]
Sections
--------
1
7.1
6.2
7.1
7.4
6.8.3
6.8.2
10
1.1
7.6
6.1
11
8.3
8.5
1.1.2
6.4
6.6
8.4
1.1.6
6.8.1
7.7.1
7.5
7.3
I want to order this like as follows
1
1.1
1.1.2
1.1.6
6.2
6.4
6.5
6.6
6.7
6.8.1
6.8.2
6.8.3
7.2
7.3
7.4
7.5
7.6
7.7.1
7.7.2
8.3
8.4
8.5
10
But as the field is not a numeric
datatype
the sorting results in something like this
1
10
1.1
1.1.2
1.1.6
....
.....
8.5
How can I sort them. I am unable to convert them to number due to multiple number of decimal points.
Is there any function in oracle
that supports such a sorting technique
When the maximum depth is known, you can split the section in sub-sections:
SQL> SELECT SECTION FROM DATA
2 ORDER BY to_number(regexp_substr(SECTION, '[^.]+', 1, 1)) NULLS FIRST,
3 to_number(regexp_substr(SECTION, '[^.]+', 1, 2)) NULLS FIRST,
4 to_number(regexp_substr(SECTION, '[^.]+', 1, 3)) NULLS FIRST;
SECTION
-------
1
1.1
1.1.2
1.1.6
6.1
6.2
[...]
8.5
10
11
If the maximum depth of sub-sections is unknown (but presumably less than a couple hundred on 8-bit character databases or less than a few thousands in ANSI-character databases), you could define a function that converts your unsortable digits into sortable characters:
SQL> CREATE OR REPLACE FUNCTION order_section (p_section VARCHAR2)
2 RETURN VARCHAR2 IS
3 l_result VARCHAR2(4000);
4 BEGIN
5 FOR i IN 1..regexp_count(p_section, '[^.]+') LOOP
6 l_result := l_result
7 || CASE WHEN i > 1 THEN '.' END
8 || CHR(64+regexp_substr(p_section, '[^.]+', 1, i));
9 END LOOP;
10 RETURN l_result;
11 END;
12 /
Function created
SQL> SELECT SECTION, order_section(SECTION)
2 FROM DATA
3 ORDER BY 2;
SECTION ORDER_SECTION(SECTION)
------- -------------------------
1 A
1.1 A.A
1.1.2 A.A.B
1.1.6 A.A.F
6.1 F.A
6.2 F.B
[...]
8.5 H.E
10 J
11 K
Solution without regexp and functions (suppose t
is a table with source data):
select * from t
order by
(
select
sum(
to_number(substr(
sections,
decode(level,
1,1,
instr(sections, '.', 1, level-1)+1
),
decode(instr(sections, '.', 1, level),
0, length(sections),
instr(sections, '.', 1, level)
-
decode(level,
1,1,
instr(sections, '.', 1, level-1)+1
)
)
))
* power(1000, 10-level)
)
from dual
connect by instr(sections,'.',1,level-1) > 0
)
SQLFiddle example
Main idea is to calculate number, wich indicates priority of each row. Suppose, we have 33.17.21.2
value. This string may be treated as a number in hypotetical numeral system with base Q
like a hexadecimal numbers represents IPv4 address, and then converted to a numeric representation:33*(Q^3) + 17*(Q^2) + 21*(Q^1) + 2*(Q^0)
For example, if Q=100
then number from exmple is
33*1000000 + 17*10000 + 21*100 + 2*1 = 33172102
First trouble with this approach is that each level numbers required to be less than choosed Q
value. It's by design and can't be eleminated.
Next is that we don't know how many levels at all, we have 7.1
and 2.2.2.2.2.2
, and shorter one most come first. Therefore while calculating value it starts from some fixed power N
and then degrades power of Q
, so in case of Q=100
and N=3
sequence of multipilers starts with this numbers:
1000000, 10000, 100, 1, 1/100, 1/10000, 1/1000000, ...
In code above Q=1000
and N=10
, but this may be changed depending on required parameters.
Number of levels limited by choosed Q
value and precision of Oracle number
type. Theoretically it's possible to build expression for longer strings by splitting string into parts.
Rest of the code is just hierarchical query for splitting string to sequence of numbers.
Same approach may be used easily just for strings: '20'
comes before '8'
because information about second digit are missing. If we pad both values to some fixed length it ordered as expected: '008' < '020'
, so it's possible to deal with strings only:
select * from t order by
(
select
listagg(
lpad(
substr(
sections,
decode( level,
1,1,
instr(sections, '.', 1, level-1)+1
),
decode(instr(sections, '.', 1, level),
0, length(sections),
instr(sections, '.', 1, level)
-
decode(level,
1, 1,
instr(sections, '.', 1, level-1)+1
)
)
),
8,'0'
),
'-'
) within group (order by level)
from dual
connect by instr(sections,'.',1,level-1) > 0
)
With string length limitation of 4000 chars and 9 digits on each level with single separation symbol ('-'
in example above) it's possible to handle 400 levels of hierarchy.
Main disadvantage of this method is a memory consumption and comparison speed.
From other side, lack of a conversion to a number makes it compatible even with mixed chapter numbering( things like '13.3.a.vii'
or 'III.A.13.2'
(Ooops ... roman numerals handled improperly)
In case of decimal-number-only numbering variant with strings may be compacted by translation of numbers to hexadecimal representation. With 4 hex symbols it's possible to handle 16535 numbers on each level, and with 8 symbols - full 32-bit number which more than enough for most applications.
select * from t order by
(
select
listagg(
lpad(
trim(to_char(
to_number(substr(
sections,
decode( level,
1,1,
instr(sections, '.', 1, level-1)+1
),
decode(instr(sections, '.', 1, level),
0, length(sections),
instr(sections, '.', 1, level)
-
decode(level,
1, 1,
instr(sections, '.', 1, level-1)+1
)
)
)),
'XXXXXXXX'
)),
4,'0'
),
'-'
) within group (order by level)
from dual
connect by instr(sections,'.',1,level-1) > 0
)
P.S. Of course, it's possible to use all expressions above in select list to examine calculated values instead of using it in order by
.
In case the number of level is fix (e.g. max. 4) you can use this one:
ORDER BY
TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 1)) NULLS FIRST,
TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 2)) NULLS FIRST,
TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 3)) NULLS FIRST,
TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 4)) NULLS FIRST
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With