Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

function to track the changes in a field

I need a function (using base SAS or RStudio) that will enable me to determine the ID numbers as of a certain date and the original (root) ID numbers as of the start date. The dataset includes the old ID, the new ID, and the date the ID changed. Example data:

OldID NewID Change Date
1 2 1/1/10
10 11 1/1/10
2 3 7/1/10
3 4 7/10/10
11 12 8/1/10

I need to know the ID numbers as of 7/15/10 and the original (root) ID (as of 1/1/10). The output should look like this:

OrigID LastID
1 4
10 11

I will then need a flag that will help me count the number of OrigID's that changed over the given time interval (in this case, 1/1/10 to 7/15/10). I need to do similar counts for multiple dates after 7/15/10 as well.

Is there a function in base SAS or RStudio that can do this?

It doesn't appear that the functions in SAS/R I researched (hierarchic loggers, synchronous tracking, sequence tracking functions) will work (e.g., logger, lumberjack, log4r, validate, futile.logger)

like image 850
Reza Avatar asked Jun 08 '21 22:06

Reza


People also ask

Is there a track changes function in Excel?

When you highlight changes as you work, Excel outlines any revisions (such as changes, insertions, and deletions) with a highlighting color. On the Review tab, click Track Changes, and then click Highlight Changes. Select the Track changes while editing.

What is a track change in Word?

Track Changes is a built in feature in Microsoft Word which allows you to see the changes that were made to the document. To view changes, or to edit a document with them, follow the instructions below. To learn how to use the comment feature, click here.


1 Answers

There are many tools in SAS for finding the connected subgraphs from the graph defined by your table of [OLDID,NEWID] edges. For example PROC OPTNET from SAS/OR. Or the %SUBGRAPHS macro created by PGStats.

So let's start by converting your listing into an actual dataset.

data have ;
  input OldID NewID Date :mmddyy.;
  format date yymmdd10.;
cards;
1 2 1/1/10
10 11 1/1/10
2 3 7/1/10
3 4 7/10/10
11 12 8/1/10
;

Then call the %SUBGRAPHS() macro to get the CLUST (subgraph id) calculated for each node.

%SubGraphs(have,from=oldid,to=newid,out=clusters);

Now re-combine it with the original data so that you have the dates.

proc sql;
  create table groups as 
    select distinct a.clust,b.*
    from clusters a
    inner join have b
      on a.node = b.oldid or a.node=b.newid
    order by a.clust,b.date
  ;
quit;

Once you match the records in your data to the same subgraph id then finding the first/last node for any date range is simple:

data want ;
  do until (last.clust);
    set groups;
    by clust date;
    where '01JAN2010'd <= date <= '15JUL2010'd;
    if first.clust then origid=oldid;
  end;
  lastid=newid;
  keep origid lastid ;
run;

Of course if you actually wanted to filter the data by the dates before searching for the subgraphs you might get a larger number of subgraphs because you might have eliminated the edge that connects two groups of nodes.

like image 177
Tom Avatar answered Sep 20 '22 05:09

Tom