There's a large database from which I have extracted a study population. For comparison purposes, I would like to select a control group that has similar characteristics. The two criteria on what I would like to match are age and gender. The query to give me the numbers that I want for matching purposes is
select sex, age/10 as decades,COUNT(*) as counts
from
(
select distinct m.patid
,m.sex,DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x group by sex, Age/10
with a result set that looks like
The decades column in this age is given by the expression
(DATEPART(year,min(c.admitdate)) -m.yrdob)/10
and this is used to find people in the age ranges of 20-29, 30-39 etc using integer division. From a larger dataset I would like to select, for instance, 507 females who are in their 20s. The query to find the characteristics of the larger dataset is
select distinct m.patid
,m.sex
,(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
EDIT: results from second query
So I need the sum
of the decades column in the second query to equal counts
in the first query. What I tried (and returned zero results) is below. What do I need to do to match these ages?
query that runs, but returns no results:
select x.PATID--,x.sex,x.decades,y.counts
from
(
select distinct m.patid
,m.sex
,(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
) as x
inner join
(
select sex, age/10 as decades,COUNT(*) as counts
from
(
select distinct m.patid
,m.sex,DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x group by sex, Age/10
) as y on x.sex=y.sex and x.decades=y.decades
group by y.counts,x.PATID,x.sex,y.sex
having SUM(x.decades)=y.counts and x.sex=y.sex
select
T1.sex,
T1.decades,
T1.counts,
T2.patid
from (
select
sex,
age/10 as decades,
COUNT(*) as counts
from (
select m.patid,
m.sex,
DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x
group by sex, Age/10
) as T1
join (
--right here is where the random sampling occurs
SELECT TOP 50--this is the total number of peolpe in our dataset
patid
,sex
,decades
from (
select m.patid,
m.sex,
(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex, m.yrdob
) T2
order by NEWID()
) as T2
on T2.sex = T1.sex
and T2.decades = T1.decades
EDIT: I had posted another question similar to this in which I found that my results weren't in fact random, but they were only the TOP N results. I had ordered by newid()
in the outermost query and all that was doing was shuffling around the exact same result set. From a question that is now closed, I found out that I needed to use the TOP
keyword along with order by newid()
in the commented line in the above query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With