Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicated records\observations WITHOUT sorting in SAS?

I wonder if there is a way to unduplicate records WITHOUT sorting?Sometimes, I want to keep original order and just want to remove duplicated records.

Is it possible?

BTW, below are what I know regarding unduplicating records, which does sorting in the end..

1.

proc sql;
   create table yourdata_nodupe as
   select distinct *
   From abc;
quit;

2.

proc sort data=YOURDATA nodupkey;    
    by var1 var2 var3 var4 var5;    
run;
like image 611
mj023119 Avatar asked Apr 18 '11 03:04

mj023119


People also ask

How do you remove duplicates in SAS without sorting?

Specifying the DISTINCT Keyword Using PROC SQL and the DISTINCT keyword provides SAS users with an effective way to remove duplicate rows where all the columns contain identical values. The following example removes duplicate rows using the DISTINCT keyword.

How do I remove duplicate observations in SAS?

You can use proc sort in SAS to quickly remove duplicate rows from a dataset. This procedure uses the following basic syntax: proc sort data=original_data out=no_dups_data nodupkey; by _all_; run; Note that the by argument specifies which columns to analyze when removing duplicates.

What is the method of removing duplicates without the remove duplicate stage?

There are multiple ways to remove duplicates other than using Remove Duplicates Stage. As stated above you can use Sort stage, Transformer stage. In sort stage, you can enable Key Change() column and it will be useful to filter the duplicate records. You can use Aggregator stage to remove duplicates.


1 Answers

You could use a hash object to keep track of which values have been seen as you pass through the data set. Only output when you encounter a key that hasn't been observed yet. This outputs in the order the data was observed in the input data set.

Here is an example using the input data set "sashelp.cars". The original data was in alphabetical order by Make so you can see that the output data set "nodupes" maintains that same order.

data nodupes (drop=rc);;
  length Make $13.;

  declare hash found_keys();
    found_keys.definekey('Make');
    found_keys.definedone();

  do while (not done);
    set sashelp.cars end=done;
    rc=found_keys.check();
    if rc^=0 then do;      
      rc=found_keys.add(); 
      output;              
    end;
  end;
  stop;
run;

proc print data=nodupes;run;
like image 62
cmjohns Avatar answered Sep 22 '22 19:09

cmjohns