I have the following data set
PatientName BVAID Rank TreatmentCode TreatmentID DoseID
Tim Stuart BVA-027 3 OP_TBC 1 1
Tim Stuart BVA-041 4 OP_TBC 1 1
Tim Stuart BVA-021 7 OP_TBC 1 1
Tim Stuart BVA-048 10 OP_TBC 1 1
Tim Stuart BVA-020 14 OP_TBC 1 1
Tim Stuart BVA-024 15 OP_TBC 1 1
Tim Stuart BVA-001 16 OP_TBC 1 1
Tim Stuart BVA-013 27 OP_TBC 1 1
Tim Stuart BVA-018 28 OP_TBC 1 1
Tim Stuart BVA-051 29 OP_TBC 1 1
Tim Stuart BVA-027 3 OP_TC 2 1
Tim Stuart BVA-041 4 OP_TC 2 1
Tim Stuart BVA-048 10 OP_TC 2 1
Tim Stuart BVA-020 14 OP_TC 2 1
Tim Stuart BVA-001 16 OP_TC 2 1
Tim Stuart BVA-002 17 OP_TC 2 1
Tim Stuart BVA-019 18 OP_TC 2 1
Tim Stuart BVA-044 22 OP_TC 2 1
Tim Stuart BVA-025 23 OP_TC 2 1
Tim Stuart BVA-016 26 OP_TC 2 1
Tim Stuart BVA-013 27 OP_TC 2 1
Tim Stuart BVA-001 16 OP_SICO 3 1
Tim Stuart BVA-002 17 OP_SICO 3 1
Tim Stuart BVA-013 27 OP_SICO 3 1
I need to output the records with the smallest rank
in each TreatmentID
group however if the record was outputted in the previous TreatmentID
group I need to select the next smallest rank
and output the record for the TreamtmentID
group - I only need one record per TreatmentID
group.
This needs to be a scalable solution that I can automate.
My output file will only have tree unique records i.e. one per each group and each records is unique in BVAID
and will have the smallest rank in that group.
PatientName BVAID Rank TreatmentCode TreatmentID DoseID
Tim Stuart BVA-027 3 OP_TBC 1 1
Tim Stuart BVA-041 4 OP_TC 2 1
Tim Stuart BVA-001 16 OP_SICO 3 1
which program can handle this better SAS or R
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
By default, the PRINT procedure displays observations in the order in which they appear in your data set. Alternatively, you can use the SORT procedure to first sort your data set based on the values of one or more variables.
What Does the SORT Procedure Do? The SORT procedure orders SAS data set observations by the values of one or more character or numeric variables. The SORT procedure either replaces the original data set or creates a new data set. PROC SORT produces only an output data set.
Compact, scalable and readable R solution :
require(data.table)
DT = as.data.table(dat) # dat input from Brian's answer
r = 0
DT[,{r<<-min(Rank[Rank>r]); .SD[Rank==r]}, by=TreatmentID]
TreatmentID PatientName BVAID Rank TreatmentCode DoseID
[1,] 1 Tim Stuart BVA-027 3 OP_TBC 1
[2,] 2 Tim Stuart BVA-041 4 OP_TC 1
[3,] 3 Tim Stuart BVA-001 16 OP_SICO 1
My SAS solution. All steps are scalable:
data test;
input PatientName $ 1-10
BVAID $
Rank
TreatmentCode $
TreatmentID
DoseID
;
datalines;
Tim Stuart BVA-027 3 OP_TBC 1 1
Tim Stuart BVA-041 4 OP_TBC 1 1
Tim Stuart BVA-021 7 OP_TBC 1 1
Tim Stuart BVA-048 10 OP_TBC 1 1
Tim Stuart BVA-020 14 OP_TBC 1 1
Tim Stuart BVA-024 15 OP_TBC 1 1
Tim Stuart BVA-001 16 OP_TBC 1 1
Tim Stuart BVA-013 27 OP_TBC 1 1
Tim Stuart BVA-018 28 OP_TBC 1 1
Tim Stuart BVA-051 29 OP_TBC 1 1
Tim Stuart BVA-027 3 OP_TC 2 1
Tim Stuart BVA-041 4 OP_TC 2 1
Tim Stuart BVA-048 10 OP_TC 2 1
Tim Stuart BVA-020 14 OP_TC 2 1
Tim Stuart BVA-001 16 OP_TC 2 1
Tim Stuart BVA-002 17 OP_TC 2 1
Tim Stuart BVA-019 18 OP_TC 2 1
Tim Stuart BVA-044 22 OP_TC 2 1
Tim Stuart BVA-025 23 OP_TC 2 1
Tim Stuart BVA-016 26 OP_TC 2 1
Tim Stuart BVA-013 27 OP_TC 2 1
Tim Stuart BVA-001 16 OP_SICO 3 1
Tim Stuart BVA-002 17 OP_SICO 3 1
Tim Stuart BVA-013 27 OP_SICO 3 1
;
run;
proc sort data=test;
by treatmentid;
run;
data test2;
set test;
by treatmentid;
retain smallest;
**
** CREATE AN EMPTY HASH TABLE THAT WE CAN STORE A LIST OF
** RANKS IN THAT HAVE ALREADY BEEN USED. DONE THIS WAY FOR
** SCALABILITY.
*;
if _n_ eq 1 then do;
declare hash ht();
ht.definekey ('rank');
ht.definedone();
end;
if first.treatmentid then do;
smallest = .;
end;
**
** IF THE CURRENT RANK HAS NOT ALREADY BEEN USED THEN
** EVALUATE IT TO SEE IF ITS THE SMALLEST VALUE.
*;
if ht.find() ne 0 then do;
smallest = min(smallest,rank);
end;
**
** SAVE THE SMALLEST UNUSED RANK BACK TO THE RANK VALUE.
** THEN ADD IT TO THE HASH TABLE AND FINALLY OUTPUT THE
** OBSERVATION.
*;
if last.treatmentid then do;
rank = smallest;
ht.add();
output;
end;
drop smallest;
run;
Does SAS win? jk! ;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With