<p>I need a function (using base SAS or RStudio) that will enable me to determine the ID numbers as of a certain date and the original (root) ID numbers as of the start date. The dataset includes the old ID, the new ID, and the date the ID changed. Example data:</p> <div class="s-table-container"> <table class="s-table"> <thead><tr> <th style="text-align: left;">OldID</th> <th style="text-align: left;">NewID</th> <th style="text-align: left;">Change Date</th> </tr></thead> <tbody> <tr> <td style="text-align: left;">1</td> <td style="text-align: left;">2</td> <td style="text-align: left;">1/1/10</td> </tr> <tr> <td style="text-align: left;">10</td> <td style="text-align: left;">11</td> <td style="text-align: left;">1/1/10</td> </tr> <tr> <td style="text-align: left;">2</td> <td style="text-align: left;">3</td> <td style="text-align: left;">7/1/10</td> </tr> <tr> <td style="text-align: left;">3</td> <td style="text-align: left;">4</td> <td style="text-align: left;">7/10/10</td> </tr> <tr> <td style="text-align: left;">11</td> <td style="text-align: left;">12</td> <td style="text-align: left;">8/1/10</td> </tr> </tbody> </table> </div> <p>I need to know the ID numbers as of 7/15/10 and the original (root) ID (as of 1/1/10). The output should look like this:</p> <div class="s-table-container"> <table class="s-table"> <thead><tr> <th style="text-align: left;">OrigID</th> <th style="text-align: left;">LastID</th> </tr></thead> <tbody> <tr> <td style="text-align: left;">1</td> <td style="text-align: left;">4</td> </tr> <tr> <td style="text-align: left;">10</td> <td style="text-align: left;">11</td> </tr> </tbody> </table> </div> <p>I will then need a flag that will help me count the number of OrigID's that changed over the given time interval (in this case, 1/1/10 to 7/15/10). I need to do similar counts for multiple dates after 7/15/10 as well.</p> <p>Is there a function in base SAS or RStudio that can do this?</p> <p>It doesn't appear that the functions in SAS/R I researched (hierarchic loggers, synchronous tracking, sequence tracking functions) will work (e.g., logger, lumberjack, log4r, validate, futile.logger)</p>

<p>There are many tools in SAS for finding the connected subgraphs from the graph defined by your table of [OLDID,NEWID] edges. For example PROC OPTNET from SAS/OR. Or the %SUBGRAPHS macro created by PGStats.</p> <p>So let's start by converting your listing into an actual dataset.</p> <pre class="prettyprint"><code>data have ; input OldID NewID Date :mmddyy.; format date yymmdd10.; cards; 1 2 1/1/10 10 11 1/1/10 2 3 7/1/10 3 4 7/10/10 11 12 8/1/10 ; </code></pre> <p>Then call the %SUBGRAPHS() macro to get the CLUST (subgraph id) calculated for each node.</p> <pre class="prettyprint"><code>%SubGraphs(have,from=oldid,to=newid,out=clusters); </code></pre> <p>Now re-combine it with the original data so that you have the dates.</p> <pre class="prettyprint"><code>proc sql; create table groups as select distinct a.clust,b.* from clusters a inner join have b on a.node = b.oldid or a.node=b.newid order by a.clust,b.date ; quit; </code></pre> <p>Once you match the records in your data to the same subgraph id then finding the first/last node for any date range is simple:</p> <pre class="prettyprint"><code>data want ; do until (last.clust); set groups; by clust date; where '01JAN2010'd <= date <= '15JUL2010'd; if first.clust then origid=oldid; end; lastid=newid; keep origid lastid ; run; </code></pre> <p>Of course if you actually wanted to filter the data by the dates <strong>before</strong> searching for the subgraphs you might get a larger number of subgraphs because you might have eliminated the edge that connects two groups of nodes.</p>

function to track the changes in a field

Tags:

r

hierarchical-data

sequence

sas

change-tracking

I need a function (using base SAS or RStudio) that will enable me to determine the ID numbers as of a certain date and the original (root) ID numbers as of the start date. The dataset includes the old ID, the new ID, and the date the ID changed. Example data:

OldID	NewID	Change Date
1	2	1/1/10
10	11	1/1/10
2	3	7/1/10
3	4	7/10/10
11	12	8/1/10

I need to know the ID numbers as of 7/15/10 and the original (root) ID (as of 1/1/10). The output should look like this:

OrigID	LastID
1	4
10	11

I will then need a flag that will help me count the number of OrigID's that changed over the given time interval (in this case, 1/1/10 to 7/15/10). I need to do similar counts for multiple dates after 7/15/10 as well.

Is there a function in base SAS or RStudio that can do this?

It doesn't appear that the functions in SAS/R I researched (hierarchic loggers, synchronous tracking, sequence tracking functions) will work (e.g., logger, lumberjack, log4r, validate, futile.logger)

850

asked Jun 08 '21 22:06

Reza

1 Answers

There are many tools in SAS for finding the connected subgraphs from the graph defined by your table of [OLDID,NEWID] edges. For example PROC OPTNET from SAS/OR. Or the %SUBGRAPHS macro created by PGStats.

So let's start by converting your listing into an actual dataset.

data have ;
  input OldID NewID Date :mmddyy.;
  format date yymmdd10.;
cards;
1 2 1/1/10
10 11 1/1/10
2 3 7/1/10
3 4 7/10/10
11 12 8/1/10
;

Then call the %SUBGRAPHS() macro to get the CLUST (subgraph id) calculated for each node.

%SubGraphs(have,from=oldid,to=newid,out=clusters);

Now re-combine it with the original data so that you have the dates.

proc sql;
  create table groups as 
    select distinct a.clust,b.*
    from clusters a
    inner join have b
      on a.node = b.oldid or a.node=b.newid
    order by a.clust,b.date
  ;
quit;

Once you match the records in your data to the same subgraph id then finding the first/last node for any date range is simple:

data want ;
  do until (last.clust);
    set groups;
    by clust date;
    where '01JAN2010'd <= date <= '15JUL2010'd;
    if first.clust then origid=oldid;
  end;
  lastid=newid;
  keep origid lastid ;
run;

Of course if you actually wanted to filter the data by the dates before searching for the subgraphs you might get a larger number of subgraphs because you might have eliminated the edge that connects two groups of nodes.

177

answered Sep 20 '22 05:09

Tom

Related questions
                            
                                Get non-zero values from string in R
                            
                                How to group by a fixed number of rows in dplyr? [duplicate]
                            
                                Why is the Rcpp implementation in my example much slower than the R function?
                            
                                How to vectorize a subsetting function in R?
                            
                                Using facet tags and strip labels together in ggplot2
                            
                                Problem with import raster package: Unable to load module "spmod"
                            
                                Converting all data.frames in environment to data.tables
                            
                                R - cannot find -llapack & cannot find -lblas
                            
                                Creating a correlation matrix from a data frame in R
                            
                                Using lapply over a list and adding a column with data frame name
                            
                                Count occurences of lists efficiently
                            
                                How to subtract two comma separated columns in R?
                            
                                Non-linear optimisation/programming with integer variables in R
                            
                                How to use submenu in rmarkdown navbar?
                            
                                R ggplot2 - legend at the bottom gets cut, how to find optimal number of columns for the legend on the fly?
                            
                                Different behavior of base R gsub and stringr::str_replace_all?
                            
                                Why does Rccp return a list-like output when I was expecting a dataframe output in R?
                            
                                R: Count frequency of values in nested list with sub-elements
                            
                                Troubleshooting 'Tool(s) not installed or not in PATH: ghostcript' warning in RStudio
                            
                                changing column names of a data frame by changing values - R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With