I have three data sets of inpatient, outpatient, and professional claims. I want to find the number of unique people who have a claim related to tobacco use (1=yes tobacco, 0=tobacco) in ANY of these three data sets.
Therefore, the data sets pretty much are all:
data inpatient;
input Patient_ID Tobacco;
datalines;
1 0
2 1
3 1
4 1
5 0
;
run;
I am trying to merge the inpatient, outpatient, and professional so that I am left with those patient ids that have a tobacco claim in any of the three data sets using:
data tobaccoall;
merge inpatient outpatient professional;
by rid;
run;
However, it is overwriting some of the 1's with 0's in the new data set. How do I better merge the data sets to find if the patient has a claim in ANY of the datasets?
When you merge data sets in SAS that share variable names, the values from the data set listed on the right in the merge
statement overwrite the values from data set to its left. In order to keep each value, you'd want to rename the variables before merging. You can do this in the merge
statement by adding a rename=
option after each data set.
If you want a single variable that represents whether a tobacco claim exists in any of the three variables, you could create a new variable using the max
function to combine the three different values.
data tobaccoall;
merge inpatient (rename=(tobacco=tobacco_in))
outpatient (rename=(tobacco=tobacco_out))
professional (rename=(tobacco=tobacco_pro));
by rid;
tobacco_any = max(tobacco_in,tobacco_out,tobacco_pro,0);
run;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With