Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill the blank values of a variable with the previous non blank value SAS 9.3

Tags:

sas

I'm using a dataset which is something like :

+----------+--------+-------+
| Variable | Level  | Value |
+----------+--------+-------+
| sexe     | men    |    10 |
|          | female |    20 |
| age      | 0-20   |     5 |
|          | 20-40  |     5 |
|          | 40-60  |    10 |
|          | >60    |    10 |
+----------+--------+-------+

And I would like to fulfill the "blank" cells using the previous non-blank cell to obtain something like this.

+----------+--------+-------+
| Variable | Level  | Value |
+----------+--------+-------+
| sexe     | men    |    10 |
| sexe     | female |    20 |
| age      | 0-20   |     5 |
| age      | 20-40  |     5 |
| age      | 40-60  |    10 |
| age      | >60    |    10 |
+----------+--------+-------+

I tried various possibilities in DATA step mostly with the LAG() function. The idea was to read the previous row when the cell was empty and fill with that.

DATA test;
   SET test;

   IF variable = . THEN DO;
      variable = LAG1(variable);
   END;
RUN;

And I obtained

+----------+--------+-------+
| Variable | Level  | Value |
+----------+--------+-------+
|          | men    |    10 |
| sexe     | female |    20 |
|          | 0-20   |     5 |
| age      | 20-40  |     5 |
|          | 40-60  |    10 |
|          | >60    |    10 |
+----------+--------+-------+

The problem was the good string is not always just one row upper. But I don't understand why SAS put blank in the first and 3d line. It didn't have to modify this line because I said "If variable = .". I know how to do this in Python or in R with some for loop but I didn't find good solution in SAS.

I tried to put the string inside a variable with "CALL SYMPUT" and also with "RETAIN" but it didn't work too.

There must be a simple and elegant way to do this. Any idea?

like image 383
jomuller Avatar asked Dec 02 '22 19:12

jomuller


1 Answers

You can't use LAG inside an IF and get that result - LAG doesn't actually work the way you think. RETAIN is the correct way I'd say:

DATA test;
   SET test;
   retain _variable;
   if not missing(variable) then _variable=variable;
   else variable=_variable;
   drop _variable;
RUN;

Lag doesn't actually go to the previous record and get its value; what it does is set up a queue, and each time LAG is called it takes off a record from the front and adds a record to the back. This means that if LAG is inside a conditional block, it won't execute for the false condition, and you don't get your queue. You can use IFN and IFC functions, which evaluate both true and false conditions regardless of the boolean, but in this case RETAIN is probably easier.

like image 142
Joe Avatar answered Jan 26 '23 06:01

Joe