Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting *most* variables to missing, while preserving the contents of a select few

I have a dataset like this (but with several hundred vars):

id  q1  g7  q3  b2  zz  gl  az  tre
1   1   2   1   1   1   2   1   1
2   2   3   3   2   2   2   1   1
3   1   2   3   3   2   1   3   3
4   3   1   2   2   3   2   1   1
5   2   1   2   2   1   2   3   3
6   3   1   1   2   2   1   3   3

I'd like to keep id, b2, and tre, but set everything else to missing. In a dataset this small, I can easily use call missing (q1, g7, q3, zz, gl, az) - but in a set with many more variables, I would effectively like to say call missing (of _ALL_ *except ID, b2, tre*).

Obviously, SAS can't read my mind. I've considered workarounds that involve another data step or proc sql where I copy the original variables to a new ds and merge them back on post, but I'm trying to find a more elegant solution.

like image 845
J.Q Avatar asked Dec 18 '22 20:12

J.Q


2 Answers

This technique uses an un-executed set statement (compile time function only) to define all variables in the original data set. Keeps the order and all variable attributes type, labels, format etc. Basically setting all the variables to missing. The next SET statement which will execute brings in only the variables the are NOT to be set to missing. It doesn't explicitly set variables to missing but achieves the same result.

   data nomiss;
       input id  q1  g7  q3  b2  zz  gl  az  tre;
       cards;
    1   1   2   1   1   1   2   1   1
    2   2   3   3   2   2   2   1   1
    3   1   2   3   3   2   1   3   3
    4   3   1   2   2   3   2   1   1
    5   2   1   2   2   1   2   3   3
    6   3   1   1   2   2   1   3   3
    ;;;;
       run;
    proc print;
       run;
    data manymiss;
       if 0 then set nomiss;
       set nomiss(keep=id b2 tre:);
       run;
    proc print;
       run;

enter image description here

like image 158
data _null_ Avatar answered Jan 17 '23 15:01

data _null_


Another fairly simple option is to set them missing using a macro, and basic code writing techniques.

For example, let's say we have a macro:

%call_missing(var=);
  call missing(&var.);
%mend call_missing;

Now we can write a query that uses dictionary.columns to identify the variables we want set to missing:

proc sql;
  select name 
    from dictionary.columns
    where libname='WORK' and memname='HAVE'
    and not (name in ('ID','B2','TRE'));  *note UPCASE for all these;
quit;

Now, we can combine these two things to get a macro variable containing code we want, and use that:

proc sql;
  select cats('%call_missing(var=',name ,')')
    into :misslist separated by ' '
    from dictionary.columns
    where libname='WORK' and memname='HAVE'
    and not (name in ('ID','B2','TRE'));  *note UPCASE for all these;
quit;

data want;
  set have;
  &misslist.;
run;

This has the advantage that it doesn't care about the variable types, nor the order. It has the disadvantage that it's somewhat more code, but it shouldn't be particularly long.

like image 26
Joe Avatar answered Jan 17 '23 16:01

Joe