Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep missing values with nodupkey

Tags:

sas

I have a database in which some of the observations have an identifier ident, and some not. I want to create a new database in which I have dropped the observations which are duplicates of my ident variable, but to keep the observations where ident is missing.

If I simply do a proc sort nodupkey

proc sort nodupkey data=have;
    by ident;
    run;

Then it also eliminates the missing values. Is there a simple way to do that (that is not break the dataset, proc sort nodupkey one partn, then assemble it again)

like image 267
Anthony Martin Avatar asked Jun 21 '26 00:06

Anthony Martin


1 Answers

You have a couple of options when removing duplicates.

First off, dupout=<dataset> on the proc sort will send all of your duplicates to another dataset, and if you want to then do something with them you can. But this is a back-end version of your 'break the dataset', just probably faster as it only breaks the smaller part.

Simpler is to do the dedup yourself.

proc sort data=have;
  by ident;
run;

data want;
  set have;
  by ident;
  if (first.ident) or missing(ident);
run;

That keeps the first record for each ident, plus any record with ident missing.

like image 185
Joe Avatar answered Jun 23 '26 18:06

Joe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!