Could someone verify my understanding of proc sql union operations? My interpretation of the differences between outer union and union is the following:
Union deletes duplicate rows, while outer union does not
Union will overlay columns, while outer union, by default, will not.
So, would there be any difference between union all corresponding and outer union corresponding? It seems like "ALL" would remove the first difference, and "CORRESPONDING" would remove the second difference, but I'm concerned there could be an additional difference between the two I'm not seeing.
It turns out there is, actually, a difference: how columns which only exist in one dataset are handled. Outer Union Corresponding
will display columns that appear only in one dataset, not overlaid by position. Union All Corresponding
will not display any columns that appear in only one dataset.
My understanding is that OUTER UNION
and UNION ALL
are effectively if not actually identical. CORR
is needed for either one to guarantee the columns line up; with OUTER UNION
the columns will not stack even if they are identical, while with UNION ALL
the columns always stack even if they are not identical (must be same data type or it will error), and pay no attention at all to column name. In both cases adding CORR
causes them to stack.
Here are some examples:
Not stacking:
proc sql;
select height, weight from sashelp.class
union all
select weight,height from sashelp.class;
select height, weight from sashelp.class
outer union
select height, weight from sashelp.class;
quit;
Stacking:
proc sql;
select height, weight from sashelp.class
union all corr
select weight,height from sashelp.class;
select height, weight from sashelp.class
outer union corr
select height, weight from sashelp.class;
quit;
This SAS doc page does a good job of showing the differences.
Try these code and you will see their difference clearly:
The test data:
data data1;
input id $ expense;
cards;
001 9
001 8
003 7
;
run;
data data2;
input id $ cost;
cards;
001 30
002 80
;
run;
This is result by union all:
proc sql;
select
Id,
sum(Expense) label='COST'
from WORK.data1
group by 1
union all
select
Id,
sum(Cost)
from WORK.data2
group by 1
order by 1,2
;
run;
This is result by outer union corr:
proc sql;
select
Id,
sum(Expense) as Cost
from WORK.data1
group by Id
outer union corr
select
Id,
sum(Cost)
from WORK.data2
group by Id
order by 1,2
;
run;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With