Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using distinct with stuff/for xml path('')

I'd like to put together only unique values in the concatenated string. My code is currently:

select rc.Routage
    , COUNT(distinct rc.Event)
    , STUFF((select ', ' + cast(rcA.Event as varchar)
            from Receiving rcA
            where rcA.supplier = 'user'
            for xml path(''))
        , 1, 1, '')
from Receiving rc
where rc.supplier = 'user'
group by rc.Routage
order by COUNT(distinct rc.Event)desc

This gives me the output I'd expect, but I would like to eliminate the duplicate values in the stuff/for xml path field.

I've tried various combinations of distinct and group by in the stuff/xml section, but can't piece it together properly.

To clarify, for COUNT(distinct rc.Event) = 2, I would like to see 2 distinct events from the stuff clause. How can I do this?

like image 221
Preston Avatar asked Mar 23 '17 11:03

Preston


People also ask

Can distinct be used with where clause?

By using the WHERE clause with a DISTINCT clause in MySQL queries, we are putting a condition on the basis of which MySQL returns the unique rows of the result set.

Is it better to use distinct or GROUP BY?

The group gives the same result as distinct when no aggregate function is present. GROUP BY is required if you're aggregating data, but in many cases, DISTINCT is simpler to write and read if you aren't aggregating data.

Where do you put distinct?

The distinct keyword is used in conjunction with select keyword. It is helpful when there is a need of avoiding duplicate values present in any specific columns/table. When we use distinct keyword only the unique values are fetched.

What does for XML Path do?

We can use FOR XML PATH to prepare a comma-separated string from the existing data. Let's create an Authors table and insert a few records into it. In the data, we can see we have an ID column and the AuthorName column. If we just select the records, it gives the output in the following format.


2 Answers

Do the distinct in a subquery, before the XML processing gets anywhere near it:

select rc.Routage
    , COUNT(distinct rc.Event)
    , STUFF((select ', ' + cast(rcA.Event as varchar)

            from (select distinct Event from Receiving a
                  where supplier = 'user'
                   and DATETIME > '20170322'
                   and rc.Routage=a.Routage
            ) rcA

            for xml path(''))
        , 1, 1, '')
from Receiving rc
where rc.supplier = 'user'
    and rc.DATETIME > '20170322'
group by rc.Routage
order by COUNT(distinct rc.Event)desc
like image 84
Damien_The_Unbeliever Avatar answered Oct 04 '22 14:10

Damien_The_Unbeliever


Use select distinct in the subquery:

select rc.Routage,
       count(distinct rc.Event),
       stuff((select distinct ', ' + cast(rcA.Event as varchar(max))
              from Receiving rcA
              where rcA.supplier = 'user' and
                    rcA.DATETIME > '20170322' and
                    rc.Routage = rcA.Routage
              for xml path('')
             ), 1, 2, '')
from Receiving rc
where rc.supplier = 'user' and rc.DATETIME > '20170322'
group by rc.Routage;

Notes:

  • In SQL Server, never use varchar() (or related types) without a length. The default varies by context and you are (potentially) introducing a bug that is really hard to find.
  • You want the stuff() to remove two characters, not 1, because you have a comma followed by a space.
  • This formulation assumes that Event does not have XML special characters. It is easy to tweak if that is an issue.

Also, this type of query is usually faster if you eliminate the duplicates in a subquery:

select rc.Routage, rc.numEvents,
       stuff((select distinct ', ' + cast(rcA.Event as varchar(max))
              from Receiving rcA
              where rcA.supplier = 'user' and
                    rcA.DATETIME > '20170322' and
                    rc.Routage = rcA.Routage
              for xml path(''), type
             ).value('.', 'varchar(max)'
                    ), 1, 2, ''
            )
from (select rc.Routage, count(distinct rc.Event) as numEvents
      from Receiving rc
      where rc.supplier = 'user' and rc.DATETIME > '20170322'
      group by rc.Routage
     ) rc;
like image 26
Gordon Linoff Avatar answered Oct 04 '22 14:10

Gordon Linoff