When using the asterisk in combination with sum and group, the duplicates are not removed as I expect (and as it works in for example mysql):
col1 | country
-----------------
5 | sweden
20 | sweden
30 | denmark
select *, sum(col1) as s from table
group by country
the data returned is:
col1 | country | s
--------------------
5 | sweden | 25
20 | sweden | 25
30 | denmark | 30
instead of what I expected:
col1 | country | s
------------------------
5 | sweden | 25
30 | denmark | 30
If I don't use asterisk (*), the data returned is as I expect it to be.
SELECT country, sum(col1) as s from table
You are correct, SAS does not collapse WHEN you have variables in the statement that are not in the GROUP BY statement. There will be a note to that effect in the log, about your data being merged.
If you want just the variables, you'll have to list them unfortunately, but since you have to list them in GROUP BY it's not extra work per se.
Different SQL implementations handle things differently, this is one way that SAS is different. It's handy when you do want to merge a summary stat back with the main data set though.
If you don't want this behaviour add the NOREMERGE option to your PROC SQL - but it throws an error, it still doesn't work the way you want. See the documentation for the reference
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With