Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple hash objects in SAS

Tags:

sas

I have two SAS data sets. The first is relatively small, and contains unique dates and a corresponding ID:

date   dateID
1jan90     10
2jan90     15
3jan90     20
...

The second data set very large, and has two date variables:

dt1     dt2
1jan90  2jan90
3jan90  1jan90
...

I need to match both dt1 and dt2 to dateID, so the output would be:

id1  id2
10   15
20   10

Efficiency is very important here. I know how to use a hash object to do one match, so I could do one data step to do the match for dt1 and then another step for dt2, but I'd like to do both in one data step. How can this be done?

Here's how I would do the match for just dt1:

data tbl3;
 if 0 then set tbl1 tbl2;

 if _n_=1 then do;
  declare hash dts(dataset:'work.tbl2');
  dts.DefineKey('date');
  dts.DefineData('dateid');
  dts.DefineDone();
 end;

 set tbl1;
 if dts.find(key:date)=0 then output;
 run;
like image 795
itzy Avatar asked Sep 13 '12 02:09

itzy


People also ask

What is hash object in SAS?

A hash object is a data structure that contains an array of items that are used to map identifying values, known as keys (e.g., employee IDs), to their associated values (e.g., employee names or employee addresses). As implemented, it is designed as a DATA step construct and is not available to any SAS PROCedures.

What are hash tables in SAS?

A hash table is a collection of records , typically from a SAS dataset, that is loaded into memory, instead of on disk, and is available for use by the DATA step that created it . Hash ta ble records have unique keys and take advantage of direct addressing instead of sequential addressing.

What is hash object name used for?

The hash object provides an efficient, convenient mechanism for quick data storage and retrieval. The hash object stores and retrieves data based on lookup keys.

What is declare hash in SAS?

Declaring a Hash or Hash Iterator Object (Form 1) declare hash h; The DECLARE statement tells SAS that the object reference H is a hash object. After you declare the new hash or hash iterator object, use the _NEW_ operator to instantiate the object.


3 Answers

A format would probably work just as efficiently given the size of your hash table...

data fmt ;
retain fmtname 'DTID' type 'N' ;
set tbl1 ;
start = date ;
label = dateid ;
run ;
proc format cntlin=fmt ; run ;

data tbl3 ;
  set tbl2 ;
  id1 = put(dt1,DTID.) ;
  id2 = put(dt2,DTID.) ;
run ;

Edited version based on below comments...

data fmt ;
retain fmtname 'DTID' type 'I' ;
set tbl1 end=eof ;
start = date ;
label = dateid ;
output ;
if eof then do ;
  hlo = 'O' ;
  label = . ;
  output ;
end ;
run ;
proc format cntlin=fmt ; run ;

data tbl3 ;
  set tbl2 ;
  id1 = input(dt1,DTID.) ;
  id2 = input(dt2,DTID.) ;
run ;
like image 71
Chris J Avatar answered Sep 21 '22 09:09

Chris J


I don't have SAS in front of me right now to test it but the code would look like this:

 data tbl3;
   if 0 then set tbl1 tbl2;

   if _n_=1 then do;
     declare hash dts(dataset:'work.tbl2');
     dts.DefineKey('date');
     dts.DefineData('dateid');
     dts.DefineDone();
   end;

   set tbl1;

   date = dt1;
   if dts.find()=0 then do;
     id1 = dateId;
   end;

   date = dt2;
   if dts.find()=0 then do;
     id2 = dateId;
   end;

   if dt1 or dt2 then do output; * KEEP ONLY RECORDS THAT MATCHED AT LEAST ONE;

   drop date dateId;
 run;
like image 30
Robert Penridge Avatar answered Sep 18 '22 09:09

Robert Penridge


I agree with the format solution, for one, but if you want to do the hash solution, here it goes. The basic thing here is that you define the key as the variable you're matching, not in the hash itself.

data tbl2;
informat date DATE7.;
input date   dateID;
datalines;
01jan90     10
02jan90     15
03jan90     20
;;;;
run;

data tbl1;
informat dt1 dt2 DATE7.;
input dt1     dt2;
datalines;
01jan90  02jan90
03jan90  01jan90
;;;;
run;
data tbl3;
 if 0 then set tbl1 tbl2;

 if _n_=1 then do;
  declare hash dts(dataset:'work.tbl2');
  dts.DefineKey('date');
  dts.DefineData('dateid');
  dts.DefineDone();
 end;

 set tbl1;
 rc1 = dts.find(key:dt1);
 if rc1=0 then id1=dateID;
 rc2 = dts.find(key:dt2);
 if rc2=0 then id2=dateID;
 if rc1=0 and rc2=0 then output;
 run;
like image 29
Joe Avatar answered Sep 19 '22 09:09

Joe