Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find which character occurs the greatest number of times in a string

Tags:

sql

oracle

With Oracle SQL query, can we do the following?

      Input       Output
    'aaaabcd' --->  'a'
    '0001001' --->  '0'

That is, find the character which is occurring the greatest number of times in the string?

like image 844
Barshan Das Avatar asked Jan 09 '23 20:01

Barshan Das


2 Answers

Yes, this is possible through the use of CONNECT BY. A bit complicated, though:

SELECT xchar, xcount FROM (
    SELECT xchar, COUNT(*) AS xcount, RANK() OVER ( ORDER BY COUNT(*) DESC) AS rn
      FROM (
        SELECT SUBSTR('aaaabcd', LEVEL, 1) AS xchar
          FROM dual
       CONNECT BY LEVEL <= LENGTH('aaaabcd')
   ) GROUP BY xchar
) WHERE rn = 1;

What we do in the innermost query is break the string into its individual characters. Then we just get the COUNT() grouped by the character, and use RANK() to find the max (note that this will return more than one result if there is a tie for the most frequently occurring character).

The above query returns both the character appearing most often and the number of times it appears.

If you have a table of multiple strings, then you'll want to do something like the following:

WITH strlen AS (
  SELECT LEVEL AS strind
    FROM dual
 CONNECT BY LEVEL <= 30
)
SELECT id, xchar, xcount FROM (
    SELECT id, xchar, COUNT(*) AS xcount, RANK() OVER ( PARTITION BY id ORDER BY COUNT(*) DESC) AS rn
      FROM (
        SELECT s.id, SUBSTR(s.str, sl.strind, 1) AS xchar
          FROM strings s, strlen sl
         WHERE LENGTH(s.str) >= sl.strind
   ) GROUP BY id, xchar
) WHERE rn = 1;

where 30 is a magic number that is equal to the length of your longest string, or greater. See SQL Fiddle here. Alternately, you could do the following to avoid the magic number:

WITH strlen AS (
  SELECT LEVEL AS strind
    FROM dual
 CONNECT BY LEVEL <= ( SELECT MAX(LENGTH(str)) FROM strings )
)
SELECT id, xchar, xcount FROM (
    SELECT id, xchar, COUNT(*) AS xcount, RANK() OVER ( PARTITION BY id ORDER BY COUNT(*) DESC) AS rn
      FROM (
        SELECT s.id, SUBSTR(s.str, sl.strind, 1) AS xchar
          FROM strings s, strlen sl
         WHERE LENGTH(s.str) >= sl.strind
   ) GROUP BY id, xchar
) WHERE rn = 1;

Updated SQL Fiddle.

like image 120
David Faber Avatar answered Jan 11 '23 22:01

David Faber


Here's one way - assuming you want to show all rows that have the highest number of characters per string:

with sample_data as (select 'aaaabcd' str from dual union all
                     select '0001001' str from dual union all
                     select '11002' str from dual),
         pivoted as (select str, substr(str, level, 1) letter
                     from   sample_data
                     connect by level <= length(str)
                                and prior str = str
                                and prior dbms_random.value is not null),
             grp as (select str, letter, count(*) cnt
                     from   pivoted
                     group by str, letter),
          ranked as (select str,
                            letter,
                            dense_rank() over (partition by str order by cnt desc) dr
                     from   grp)
select str, letter
from   ranked
where  dr = 1;

STR     LETTER
------- ------
0001001 0     
11002   1     
11002   0     
aaaabcd a     

If you wanted to only show one of the letters in the event of a tie, change the dense_rank() in the query above for a row_number.

If you wanted to show all tied letters in a single row (e.g. comma separated) then use listagg in the final query to group the rows into one.

like image 22
Boneist Avatar answered Jan 11 '23 20:01

Boneist