I have this dataset :
> head(xc)
wheeze3 SmokingGroup_Kai TG2000 TG2012 PA_Score asthma3 tres3 age3 bmi bmi3
1 0 1 2 2 2 0 0 47 20.861 21.88708
2 0 5 2 3 3 0 0 57 20.449 23.05175
3 0 1 2 3 2 0 0 45 25.728 26.06168
4 0 2 1 1 3 0 0 48 22.039 23.50780
5 1 4 2 2 1 0 1 61 25.391 25.63692
6 0 4 2 2 2 0 0 54 21.633 23.66144
education3 group_change
1 2 0
2 2 3
3 3 3
4 3 0
5 1 0
6 2 0
Here
asthma3
is a variable that takes values 0,1 ;
group_change
takes values 0,1,2,3,4,5,6 ;
age3
represents the age.
I would like to plot the percentage of people with asthma3==1
as a function of the variable age3
.
I would like 6 lines on the same plot obtained dividing the samples by group_change
.
I think that this should be possible using ggplot2
.
Here's a ggplot2 approach:
library(ggplot2)
library(dplyr)
# Create fake data
set.seed(10)
xc=data.frame(age3=sample(40:50, 500, replace=TRUE),
asthma3=sample(0:1,500, replace=TRUE),
group_change=sample(0:6, 500, replace=TRUE))
# Summarize asthma percent by group_change and age3 (using dplyr)
xc1 = xc %.%
group_by(group_change, age3) %.%
summarize(asthma.pct=mean(asthma3)*100)
# Plot using ggplot2
ggplot(xc1, aes(x=age3, y=asthma.pct, colour=as.factor(group_change))) +
geom_line() +
geom_point() +
scale_x_continuous(breaks=40:50) +
xlab("Age") + ylab("Asthma Percent") +
scale_colour_discrete(name="Group Change")
Here's another ggplot2 approach that works directly with the original data frame and calculates the percentages on the fly. I've also formatted the y-axis in percent format.
library(scales) # Need this for "percent_format()"
ggplot(xc, aes(x=age3, y=asthma3, colour=as.factor(group_change))) +
stat_summary(fun.y=mean, geom='line') +
stat_summary(fun.y=mean, geom='point') +
scale_x_continuous(breaks=40:50) +
scale_y_continuous(labels=percent_format()) +
xlab("Age") + ylab("Asthma Percent") +
scale_colour_discrete(name="Group Change")
Here is one way using Stata. The example data has three groups.
The proportions are computed taking the mean of asthma3
which you identify as a binary variable.
clear all
set more off
*----- example data -----
set obs 500
set seed 135
gen age3 = floor((50-40+1)*runiform() + 40)
gen asthma3 = round(runiform())
egen group_change = seq(), to(3)
*----- pretty list -----
order age3 group_change
sort age3 group_change asthma3
list, sepby(age3)
*----- compute proportions -----
collapse (mean) asthma3, by(age3 group_change)
list
*----- syntax for graph and graph -----
levelsof(group_change), local(gc)
local i = 1
foreach g of local gc {
local call `call' || connected asthma3 age3 if group_change == `g', sort
local leg `leg' label(`i++' "Group`g'") // syntax for legend
}
twoway `call' legend(`leg') /// graph
title("Proportion with asthma by group")
This coincides with one of my first questions in Statalist. In Nick's words, you "build up the syntax" using a local macro and then feed that to twoway
.
@NickCox, in a comment, suggests an alternative:
<snip>
*----- compute proportions -----
collapse (mean) asthma3, by(age3 group_change)
list
*----- graph -----
separate asthma3, by(group_change) veryshortlabel
twoway connected asthma31-asthma33 age3, sort ///
title("Proportion with asthma by group")
<snip>
This second alternative creates new variables from the original asthma3
which I abbreviate in the call to twoway connected
as asthma31-asthma33
.
Both alternatives produce a legend identifying groups. Labels I leave to you (see help graph
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With