I have two SAS data sets. The first is relatively small, and contains unique dates and a corresponding ID: <pre class="prettyprint"><code>date dateID 1jan90 10 2jan90 15 3jan90 20 ... </code></pre> The second data set very large, and has two date variables: <pre class="prettyprint"><code>dt1 dt2 1jan90 2jan90 3jan90 1jan90 ... </code></pre> I need to match both <code>dt1</code> and <code>dt2</code> to <code>dateID</code>, so the output would be: <pre class="prettyprint"><code>id1 id2 10 15 20 10 </code></pre> Efficiency is very important here. I know how to use a hash object to do one match, so I could do one data step to do the match for <code>dt1</code> and then another step for <code>dt2</code>, but I'd like to do both in one data step. How can this be done? Here's how I would do the match for just <code>dt1</code>: <pre class="prettyprint"><code>data tbl3; if 0 then set tbl1 tbl2; if _n_=1 then do; declare hash dts(dataset:'work.tbl2'); dts.DefineKey('date'); dts.DefineData('dateid'); dts.DefineDone(); end; set tbl1; if dts.find(key:date)=0 then output; run; </code></pre>

A format would probably work just as efficiently given the size of your hash table... <pre class="prettyprint"><code>data fmt ; retain fmtname 'DTID' type 'N' ; set tbl1 ; start = date ; label = dateid ; run ; proc format cntlin=fmt ; run ; data tbl3 ; set tbl2 ; id1 = put(dt1,DTID.) ; id2 = put(dt2,DTID.) ; run ; </code></pre> Edited version based on below comments... <pre class="prettyprint"><code>data fmt ; retain fmtname 'DTID' type 'I' ; set tbl1 end=eof ; start = date ; label = dateid ; output ; if eof then do ; hlo = 'O' ; label = . ; output ; end ; run ; proc format cntlin=fmt ; run ; data tbl3 ; set tbl2 ; id1 = input(dt1,DTID.) ; id2 = input(dt2,DTID.) ; run ; </code></pre>

Multiple hash objects in SAS

Tags:

sas

I have two SAS data sets. The first is relatively small, and contains unique dates and a corresponding ID:

date   dateID
1jan90     10
2jan90     15
3jan90     20
...

The second data set very large, and has two date variables:

dt1     dt2
1jan90  2jan90
3jan90  1jan90
...

I need to match both dt1 and dt2 to dateID, so the output would be:

id1  id2
10   15
20   10

Efficiency is very important here. I know how to use a hash object to do one match, so I could do one data step to do the match for dt1 and then another step for dt2, but I'd like to do both in one data step. How can this be done?

Here's how I would do the match for just dt1:

data tbl3;
 if 0 then set tbl1 tbl2;

 if _n_=1 then do;
  declare hash dts(dataset:'work.tbl2');
  dts.DefineKey('date');
  dts.DefineData('dateid');
  dts.DefineDone();
 end;

 set tbl1;
 if dts.find(key:date)=0 then output;
 run;

795

asked Sep 13 '12 02:09

itzy

3 Answers

A format would probably work just as efficiently given the size of your hash table...

data fmt ;
retain fmtname 'DTID' type 'N' ;
set tbl1 ;
start = date ;
label = dateid ;
run ;
proc format cntlin=fmt ; run ;

data tbl3 ;
  set tbl2 ;
  id1 = put(dt1,DTID.) ;
  id2 = put(dt2,DTID.) ;
run ;

Edited version based on below comments...

data fmt ;
retain fmtname 'DTID' type 'I' ;
set tbl1 end=eof ;
start = date ;
label = dateid ;
output ;
if eof then do ;
  hlo = 'O' ;
  label = . ;
  output ;
end ;
run ;
proc format cntlin=fmt ; run ;

data tbl3 ;
  set tbl2 ;
  id1 = input(dt1,DTID.) ;
  id2 = input(dt2,DTID.) ;
run ;

answered Sep 21 '22 09:09

Chris J

I don't have SAS in front of me right now to test it but the code would look like this:

 data tbl3;
   if 0 then set tbl1 tbl2;

   if _n_=1 then do;
     declare hash dts(dataset:'work.tbl2');
     dts.DefineKey('date');
     dts.DefineData('dateid');
     dts.DefineDone();
   end;

   set tbl1;

   date = dt1;
   if dts.find()=0 then do;
     id1 = dateId;
   end;

   date = dt2;
   if dts.find()=0 then do;
     id2 = dateId;
   end;

   if dt1 or dt2 then do output; * KEEP ONLY RECORDS THAT MATCHED AT LEAST ONE;

   drop date dateId;
 run;

answered Sep 18 '22 09:09

Robert Penridge

I agree with the format solution, for one, but if you want to do the hash solution, here it goes. The basic thing here is that you define the key as the variable you're matching, not in the hash itself.

data tbl2;
informat date DATE7.;
input date   dateID;
datalines;
01jan90     10
02jan90     15
03jan90     20
;;;;
run;

data tbl1;
informat dt1 dt2 DATE7.;
input dt1     dt2;
datalines;
01jan90  02jan90
03jan90  01jan90
;;;;
run;
data tbl3;
 if 0 then set tbl1 tbl2;

 if _n_=1 then do;
  declare hash dts(dataset:'work.tbl2');
  dts.DefineKey('date');
  dts.DefineData('dateid');
  dts.DefineDone();
 end;

 set tbl1;
 rc1 = dts.find(key:dt1);
 if rc1=0 then id1=dateID;
 rc2 = dts.find(key:dt2);
 if rc2=0 then id2=dateID;
 if rc1=0 and rc2=0 then output;
 run;

answered Sep 19 '22 09:09

Joe

Related questions
                            
                                Python - Is there a shorthand for [eg]: print(f'type(var) = {type(var)}')
                            
                                Equivalent of Access Crosstab Query in SAS?
                            
                                Macro variables issue when using call execute
                            
                                Call local SAS macro in RSUBMIT block?
                            
                                SAS EG 4.1 path to code file
                            
                                wildcards in keep in a data step
                            
                                Why SAS string concat cannot involve the variable itself?
                            
                                SAS editor window position
                            
                                Generate All Unique Permutations of an Array in SAS
                            
                                stacking data in SAS
                            
                                macro variable is uninitialized after %let statement in sas
                            
                                Use SAS Macro Variable within Proc SQL Teradata passthrough
                            
                                Why aren't SAS Macro Variables Local-Scope by Default?
                            
                                Shortcut to comment selected code in SAS Guide
                            
                                .z syntax in SAS
                            
                                What exactly is table size in SAS HashTable specified by hashexp?
                            
                                Append multiple CSV files in SAS
                            
                                SAS Proc sql row number
                            
                                how to list all available user defined macros in SAS Enterprise Guide?
                            
                                if _N_ = 1 condition returns true even if the set dataset is empty (zero observations) in SAS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With