Yesterday in a job interview session I was asked this question and I had no clue about it. Suppose I have a word "Manhattan " I want to display only the letters 'M','A','N','H','T' in SQL. How to do it? Any help is appreciated.
Well, here is my solution (sqlfiddle) - it aims to use a "Relational SQL" operations, which may have been what the interviewer was going for conceptually.
Most of the work done is simply to turn the string into a set of (pos, letter) records as the relevant final applied DQL is a mere SELECT with a grouping and ordering applied.
select letter
from (
-- All of this just to get a set of (pos, letter)
select ns.n as pos, substring(ss.s, ns.n, 1) as letter
from (select 'MANHATTAN' as s) as ss
cross join (
-- Or use another form to create a "numbers table"
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9)) as X(n)
) as ns
) as pairs
group by letter -- guarantees distinctness
order by min(pos) -- ensure output is ordered MANHT
The above query works in SQL Server 2008, but the "Numbers Table" may have to be altered for other vendors. Otherwise, there is nothing used that is vendor specific - no CTE, or cross application of a function, or procedural language code ..
That being said, the above is to show a conceptual approach - SQL is designed for use with sets and relations and multiplicity across records; the above example is, in some sense, merely a perversion of such.
Examining the intermediate relation,
select ns.n as pos, substring(ss.s, ns.n, 1) as letter
from (select 'MANHATTAN' as s) as ss
cross join (
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9)) as X(n)
) as ns
uses a cross join to generate the Cartesian product of the string (1 row) with the numbers (9 rows); the substring
function is then applied with the string and each number to obtain each character in accordance with its position. The resulting set contains the records-
POS LETTER
1 M
2 A
3 N
..
9 N
Then the outer select groups each record according to the letter and the resulting records are ordered by the minimum (first) occurrence position of the letter that establishing the grouping. (Without the order by the letters would have been distinct but the final order would not be guaranteed.)
One way (if using SQL Server) is with a recursive CTE (Commom Table Expression).
DECLARE @source nvarchar(100) = 'MANHATTAN'
;
WITH cte AS (
SELECT SUBSTRING(@source, 1, 1) AS c1, 1 as Pos
WHERE LEN(@source) > 0
UNION ALL
SELECT SUBSTRING(@source, Pos + 1, 1) AS c1, Pos + 1 as Pos
FROM cte
WHERE Pos < LEN(@source)
)
SELECT DISTINCT c1 from cte
SqlFiddle for this is here. I had to inline the @source
for SqlFiddle, but the code above works fine in Sql Server.
The first SELECT
generates the initial row(in this case 'M', 1). The second SELECT
is the recursive part that generates the subsequent rows, with the Pos
column getting incremented each time until the termination condition WHERE Pos < LEN(@source)
is finally met. The final select removes the duplicates. Internally, SELECT DISTINCT
sorts the rows in order to facilitate the removal of duplicates, which is why the final output happens to be in alphabetic order. Since you didn't specify order as a requirement, I left it as-is. But you could modify it to use a GROUP
instead, that ordered on MIN(Pos)
if you needed the output in the characters' original order.
This same technique can be used for things like generating all the Bigrams for a string, with just a small change to the general structure above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With