I need a function (using base SAS or RStudio) that will enable me to determine the ID numbers as of a certain date and the original (root) ID numbers as of the start date. The dataset includes the old ID, the new ID, and the date the ID changed. Example data:
OldID | NewID | Change Date |
---|---|---|
1 | 2 | 1/1/10 |
10 | 11 | 1/1/10 |
2 | 3 | 7/1/10 |
3 | 4 | 7/10/10 |
11 | 12 | 8/1/10 |
I need to know the ID numbers as of 7/15/10 and the original (root) ID (as of 1/1/10). The output should look like this:
OrigID | LastID |
---|---|
1 | 4 |
10 | 11 |
I will then need a flag that will help me count the number of OrigID's that changed over the given time interval (in this case, 1/1/10 to 7/15/10). I need to do similar counts for multiple dates after 7/15/10 as well.
Is there a function in base SAS or RStudio that can do this?
It doesn't appear that the functions in SAS/R I researched (hierarchic loggers, synchronous tracking, sequence tracking functions) will work (e.g., logger, lumberjack, log4r, validate, futile.logger)
When you highlight changes as you work, Excel outlines any revisions (such as changes, insertions, and deletions) with a highlighting color. On the Review tab, click Track Changes, and then click Highlight Changes. Select the Track changes while editing.
Track Changes is a built in feature in Microsoft Word which allows you to see the changes that were made to the document. To view changes, or to edit a document with them, follow the instructions below. To learn how to use the comment feature, click here.
There are many tools in SAS for finding the connected subgraphs from the graph defined by your table of [OLDID,NEWID] edges. For example PROC OPTNET from SAS/OR. Or the %SUBGRAPHS macro created by PGStats.
So let's start by converting your listing into an actual dataset.
data have ;
input OldID NewID Date :mmddyy.;
format date yymmdd10.;
cards;
1 2 1/1/10
10 11 1/1/10
2 3 7/1/10
3 4 7/10/10
11 12 8/1/10
;
Then call the %SUBGRAPHS() macro to get the CLUST (subgraph id) calculated for each node.
%SubGraphs(have,from=oldid,to=newid,out=clusters);
Now re-combine it with the original data so that you have the dates.
proc sql;
create table groups as
select distinct a.clust,b.*
from clusters a
inner join have b
on a.node = b.oldid or a.node=b.newid
order by a.clust,b.date
;
quit;
Once you match the records in your data to the same subgraph id then finding the first/last node for any date range is simple:
data want ;
do until (last.clust);
set groups;
by clust date;
where '01JAN2010'd <= date <= '15JUL2010'd;
if first.clust then origid=oldid;
end;
lastid=newid;
keep origid lastid ;
run;
Of course if you actually wanted to filter the data by the dates before searching for the subgraphs you might get a larger number of subgraphs because you might have eliminated the edge that connects two groups of nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With