I have a dataset like this (but with several hundred vars):
id q1 g7 q3 b2 zz gl az tre
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
I'd like to keep id
, b2
, and tre
, but set everything else to missing. In a dataset this small, I can easily use call missing (q1, g7, q3, zz, gl, az)
- but in a set with many more variables, I would effectively like to say call missing (of _ALL_ *except ID, b2, tre*)
.
Obviously, SAS can't read my mind. I've considered workarounds that involve another data step or proc sql where I copy the original variables to a new ds and merge them back on post, but I'm trying to find a more elegant solution.
This technique uses an un-executed set statement (compile time function only) to define all variables in the original data set. Keeps the order and all variable attributes type, labels, format etc. Basically setting all the variables to missing. The next SET statement which will execute brings in only the variables the are NOT to be set to missing. It doesn't explicitly set variables to missing but achieves the same result.
data nomiss;
input id q1 g7 q3 b2 zz gl az tre;
cards;
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
;;;;
run;
proc print;
run;
data manymiss;
if 0 then set nomiss;
set nomiss(keep=id b2 tre:);
run;
proc print;
run;
Another fairly simple option is to set them missing using a macro, and basic code writing techniques.
For example, let's say we have a macro:
%call_missing(var=);
call missing(&var.);
%mend call_missing;
Now we can write a query that uses dictionary.columns
to identify the variables we want set to missing:
proc sql;
select name
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
Now, we can combine these two things to get a macro variable containing code we want, and use that:
proc sql;
select cats('%call_missing(var=',name ,')')
into :misslist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
data want;
set have;
&misslist.;
run;
This has the advantage that it doesn't care about the variable types, nor the order. It has the disadvantage that it's somewhat more code, but it shouldn't be particularly long.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With