Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split and display distinct letters from a word in SQL?

Tags:

sql

Yesterday in a job interview session I was asked this question and I had no clue about it. Suppose I have a word "Manhattan " I want to display only the letters 'M','A','N','H','T' in SQL. How to do it? Any help is appreciated.

like image 315
StrugglingCoder Avatar asked May 24 '14 04:05

StrugglingCoder


2 Answers

Well, here is my solution (sqlfiddle) - it aims to use a "Relational SQL" operations, which may have been what the interviewer was going for conceptually.

Most of the work done is simply to turn the string into a set of (pos, letter) records as the relevant final applied DQL is a mere SELECT with a grouping and ordering applied.

select letter
from (
  -- All of this just to get a set of (pos, letter)
  select ns.n as pos, substring(ss.s, ns.n, 1) as letter
  from (select 'MANHATTAN' as s) as ss
  cross join (
    -- Or use another form to create a "numbers table"
    select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9)) as X(n)
    ) as ns
  ) as pairs
group by letter    -- guarantees distinctness
order by min(pos)  -- ensure output is ordered MANHT

The above query works in SQL Server 2008, but the "Numbers Table" may have to be altered for other vendors. Otherwise, there is nothing used that is vendor specific - no CTE, or cross application of a function, or procedural language code ..

That being said, the above is to show a conceptual approach - SQL is designed for use with sets and relations and multiplicity across records; the above example is, in some sense, merely a perversion of such.


Examining the intermediate relation,

  select ns.n as pos, substring(ss.s, ns.n, 1) as letter
  from (select 'MANHATTAN' as s) as ss
  cross join (
    select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9)) as X(n)
    ) as ns

uses a cross join to generate the Cartesian product of the string (1 row) with the numbers (9 rows); the substring function is then applied with the string and each number to obtain each character in accordance with its position. The resulting set contains the records-

POS LETTER
1   M
2   A
3   N
..
9   N

Then the outer select groups each record according to the letter and the resulting records are ordered by the minimum (first) occurrence position of the letter that establishing the grouping. (Without the order by the letters would have been distinct but the final order would not be guaranteed.)

like image 118
user2864740 Avatar answered Oct 05 '22 19:10

user2864740


One way (if using SQL Server) is with a recursive CTE (Commom Table Expression).

DECLARE @source nvarchar(100) = 'MANHATTAN'
;
WITH cte AS (
    SELECT SUBSTRING(@source, 1, 1) AS c1, 1 as Pos
    WHERE LEN(@source) > 0
    UNION ALL
    SELECT SUBSTRING(@source, Pos + 1, 1) AS c1, Pos + 1 as Pos
    FROM cte
    WHERE Pos < LEN(@source)
)
SELECT DISTINCT c1 from cte

SqlFiddle for this is here. I had to inline the @source for SqlFiddle, but the code above works fine in Sql Server.

The first SELECT generates the initial row(in this case 'M', 1). The second SELECT is the recursive part that generates the subsequent rows, with the Pos column getting incremented each time until the termination condition WHERE Pos < LEN(@source) is finally met. The final select removes the duplicates. Internally, SELECT DISTINCT sorts the rows in order to facilitate the removal of duplicates, which is why the final output happens to be in alphabetic order. Since you didn't specify order as a requirement, I left it as-is. But you could modify it to use a GROUP instead, that ordered on MIN(Pos) if you needed the output in the characters' original order.

This same technique can be used for things like generating all the Bigrams for a string, with just a small change to the general structure above.

like image 45
hatchet - done with SOverflow Avatar answered Oct 05 '22 20:10

hatchet - done with SOverflow