Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SAS Hash object sum

Tags:

sas

I am trying to understand the sum() function of SAS Hash object. To my understanding, suminc: defines the variable the SAS hash object keeps track of while sum() will sum up the values of that variable.

Suppose I have the data set

data sample;
    input id x;
    datalines;
1 350
1 220
1 300
2 300
2 500
;
run;

I want the aggregation to be

id x_sum
2  800
1  870

However, my hash code:

data _null_;
    set sample end= done;

    length x_sum 8;

    if _N_ = 1 then 
    do;
        declare hash T(suminc:"x");
        T.definekey("id");
        T.definedata("id");
        T.definedata("x_sum");
        T.definedone();
    end;


    T.ref();

    T.sum(sum:x_sum);

    put _all_;


    T.replace();

    if done then T.output(dataset: "my_set");

run;

outputs:

id x_sum
2  800
1  520

as a data set and to the log:

done=0 id=1 x=350 x_sum=350 _ERROR_=0 _N_=1
done=0 id=1 x=220 x_sum=570 _ERROR_=0 _N_=2
**done=0 id=1 x=300 x_sum=520 _ERROR_=0 _N_=3**
done=0 id=2 x=300 x_sum=300 _ERROR_=0 _N_=4
done=1 id=2 x=500 x_sum=800 _ERROR_=0 _N_=5

Can anyone explain to me what is happening?

UPDATE AFTER ALL THE COMMENTS:

Hi all, I am totally new to Stack Overflow so I am still figuring out this "Tick off answer" system... I felt every one contributed something.

Anyway, after a lot of experiments, I figured out what was happening -

Basically, whenever .sum() .replace() is called, the sum counter is reset to zero. This, and not really replace() etc, is the reason why the results was what it was - the sum() reset my count and so ref() was only ever summing the previous 2 observations.

Hope this is useful information to everyone. If others have any insight, please do share.

like image 677
Matt Avatar asked Dec 19 '25 14:12

Matt


1 Answers

The problem is your use of Replace(). From the docs (9.3 language reference concpets, using the hash object):

This SUMINC tag instructs the hash object to allocate internal storage for maintaining a summary value for each key. The summary value of a hash key is initialized to the value of the SUMINC variable whenever the ADD or REPLACE method is used. The summary value of a hash key is incremented by the value of the SUMINC variable whenever the FIND, CHECK, or REF method is used.

I think an important point is that the "summary value" is not the DATA step variable x_sum, or x_sum stored as data variable in the hash table. It's stored outside of the hash table's defined data. It's ancillary information that is really an attribute of the key. (in my head...)

If you comment out the replace(), your code works (you get the right value for x_sum in the PDV), but the problem is that x_sum is never written to the hash table. So you called replace() to write x_sum to the hash table, causing the unfortunate side effect that the summary value is initialized to the value of x. I think a workaround answer is to assign x=x_sum before you call replace(). That way when replace() reinitializes the summary value to the value of x, x holds the current summary value. Hard for me to put in words, but see below added just one statement.

data _null_;
  set sample end= done;

  length x_sum 8;

  if _N_ = 1 then 
  do;
    declare hash T(suminc:"x");
    T.definekey("id");
    T.definedata("id");
    T.definedata("x_sum");
    T.definedone();
  end;
  T.ref();
  T.sum(sum:x_sum);
  put _all_;

  x=x_sum;  *Replace method will initialize the summary value to x! ;

  T.replace();
  if done then T.output(dataset: "my_set");
run;
like image 110
Quentin Avatar answered Dec 23 '25 07:12

Quentin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!