With Oracle SQL query, can we do the following?
Input Output
'aaaabcd' ---> 'a'
'0001001' ---> '0'
That is, find the character which is occurring the greatest number of times in the string?
Yes, this is possible through the use of CONNECT BY
. A bit complicated, though:
SELECT xchar, xcount FROM (
SELECT xchar, COUNT(*) AS xcount, RANK() OVER ( ORDER BY COUNT(*) DESC) AS rn
FROM (
SELECT SUBSTR('aaaabcd', LEVEL, 1) AS xchar
FROM dual
CONNECT BY LEVEL <= LENGTH('aaaabcd')
) GROUP BY xchar
) WHERE rn = 1;
What we do in the innermost query is break the string into its individual characters. Then we just get the COUNT()
grouped by the character, and use RANK()
to find the max (note that this will return more than one result if there is a tie for the most frequently occurring character).
The above query returns both the character appearing most often and the number of times it appears.
If you have a table of multiple strings, then you'll want to do something like the following:
WITH strlen AS (
SELECT LEVEL AS strind
FROM dual
CONNECT BY LEVEL <= 30
)
SELECT id, xchar, xcount FROM (
SELECT id, xchar, COUNT(*) AS xcount, RANK() OVER ( PARTITION BY id ORDER BY COUNT(*) DESC) AS rn
FROM (
SELECT s.id, SUBSTR(s.str, sl.strind, 1) AS xchar
FROM strings s, strlen sl
WHERE LENGTH(s.str) >= sl.strind
) GROUP BY id, xchar
) WHERE rn = 1;
where 30
is a magic number that is equal to the length of your longest string, or greater. See SQL Fiddle here. Alternately, you could do the following to avoid the magic number:
WITH strlen AS (
SELECT LEVEL AS strind
FROM dual
CONNECT BY LEVEL <= ( SELECT MAX(LENGTH(str)) FROM strings )
)
SELECT id, xchar, xcount FROM (
SELECT id, xchar, COUNT(*) AS xcount, RANK() OVER ( PARTITION BY id ORDER BY COUNT(*) DESC) AS rn
FROM (
SELECT s.id, SUBSTR(s.str, sl.strind, 1) AS xchar
FROM strings s, strlen sl
WHERE LENGTH(s.str) >= sl.strind
) GROUP BY id, xchar
) WHERE rn = 1;
Updated SQL Fiddle.
Here's one way - assuming you want to show all rows that have the highest number of characters per string:
with sample_data as (select 'aaaabcd' str from dual union all
select '0001001' str from dual union all
select '11002' str from dual),
pivoted as (select str, substr(str, level, 1) letter
from sample_data
connect by level <= length(str)
and prior str = str
and prior dbms_random.value is not null),
grp as (select str, letter, count(*) cnt
from pivoted
group by str, letter),
ranked as (select str,
letter,
dense_rank() over (partition by str order by cnt desc) dr
from grp)
select str, letter
from ranked
where dr = 1;
STR LETTER
------- ------
0001001 0
11002 1
11002 0
aaaabcd a
If you wanted to only show one of the letters in the event of a tie, change the dense_rank()
in the query above for a row_number
.
If you wanted to show all tied letters in a single row (e.g. comma separated) then use listagg in the final query to group the rows into one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With