I have a unique list of strings (the original idea was the column names in a table). The task is to perform a maximal possible abbreviation of the list, so the list remains distinct.
For example AAA, AB
can be abbreviated to AA, AB
. (But not to A, AB
– as A
could be prefix of both AAA
and AB
).
AAAA, BAAAA
can be shorten to A, B
.
But A1, A2
can’t be abbreviated at all.
Here are the sample data
create table tab as
select 'AAA' col from dual union all
select 'AABA' col from dual union all
select 'COL1' col from dual union all
select 'COL21' col from dual union all
select 'AAAAAA' col from dual union all
select 'BBAA' col from dual union all
select 'BAAAA' col from dual union all
select 'AB' col from dual;
The expected result is
COL ABR_COL
------ ------------------------
AAA AAA
AAAAAA AAAA
AABA AAB
AB AB
BAAAA BA
BBAA BB
COL1 COL1
COL21 COL2
I managed a brute force solution consisting of four subqueries, which I do not post on purpose, because I hope there exists a more simple solution from which I do not want to distract.
Btw there is a similar function in r
called abbreviate
, but I’m looking for SQL solution. Prefered Oracle
solutions for other RDBMS are welcommed.
An abbreviation, simply put, is a shortened form of a word. In writing, abbreviations are useful when you need to squeeze a lot of writing into a small space. You can also use them in place of long or cumbersome phrases to make your sentences easier to read.
An abbreviation (from Latin brevis, meaning short) is a shortened form of a word or phrase, by any method.
ea. written abbreviation for each: used to give the price, weight, etc.
I would do the filtering in the recursive CTE:
with potential_abbreviations(col, abbr, lev) as (
select col, col as abbr, 1 as lev
from tab
union all
select pa.col, substr(pa.abbr, 1, length(pa.abbr) - 1) as abbr, lev + 1
from potential_abbreviations pa
where length(abbr) > 1 and
not exists (select 1
from tab
where tab.col like substr(pa.abbr, 1, length(pa.abbr) - 1) || '%' and
tab.col <> pa.col
)
)
select pa.col, pa.abbr
from (select pa.*, row_number() over (partition by pa.col order by pa.lev desc) as seqnum
from potential_abbreviations pa
) pa
where seqnum = 1
Here is a db<>fiddle.
The lev
is strictly not necessary. You can use length(abbr) desc
in the order by
. But, I usually include a recursion counter when I use recursive CTEs, so this is habit.
Doing the extra comparison in the CTE may look more complicated, but it simplifies the execution -- the recursion stops at the correct value.
This is also tested on unique single letter col
values.
This is actually possible using a recursive CTE. I don't really get it shorter than three subqueries (plus one query), but at least it is not constrained by string length. The steps are roughly as follows:
Table:
col abbr
--- -------
AAA AAA
AAA AA
AAA A
...
Table
ABBR CONFLICT
---- --------
AA 3
AAA 2
AABA 1
...
AAA
conflicts with some other abbreviation but still must be chosen as it is equal to its unshortened name.Table
COL ABBR CONFLICT POS
-------------------------------
AAA AAA 2 1
AAAAAA AAAA 1 1
AAAAAA AAAAA 1 2
AAAAAA AAAAAA 1 3
AABA AAB 1 1
...
Table
COL ABBR POS
-------------------
AAA AAA 1
AAAAAA AAAA 1
AABA AAB 1
...
This results in the following SQL, with the above steps as CTEs:
with potential_abbreviations(col,abbr) as (
select
col
, col as abbr
from tab
union all
select
col
, substr(abbr, 1, length(abbr)-1 ) as abbr
from potential_abbreviations
where length(abbr) > 1
)
, abbreviation_counts as (
select abbr
, count(*) as conflict
from potential_abbreviations
group by abbr
)
, all_unique_abbreviations(col,abbr,conflict,pos) as (
select
p.col
, p.abbr
, conflict
, rank() over (partition by col order by p.abbr) as pos
from potential_abbreviations p
join abbreviation_counts c on p.abbr = c.abbr
where conflict = 1 or p.col = p.abbr
)
select col, abbr, pos
from all_unique_abbreviations
where pos = 1
order by col, abbr
COL ABBR
------- ----
AAA AAA
AAAAAA AAAA
AABA AAB
AB AB
AC1 AC
AD AD
BAAAA BA
BBAA BB
COL1 COL1
COL21 COL2
SQL Fiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With