Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot relative frequencies in R or Stata

I have this dataset :

> head(xc)
  wheeze3 SmokingGroup_Kai TG2000 TG2012 PA_Score asthma3 tres3 age3    bmi     bmi3
1       0                1      2      2        2       0     0   47 20.861 21.88708
2       0                5      2      3        3       0     0   57 20.449 23.05175
3       0                1      2      3        2       0     0   45 25.728 26.06168
4       0                2      1      1        3       0     0   48 22.039 23.50780
5       1                4      2      2        1       0     1   61 25.391 25.63692
6       0                4      2      2        2       0     0   54 21.633 23.66144
  education3 group_change
1          2            0
2          2            3
3          3            3
4          3            0
5          1            0
6          2            0

Here asthma3 is a variable that takes values 0,1 ; group_change takes values 0,1,2,3,4,5,6 ; age3 represents the age.

I would like to plot the percentage of people with asthma3==1 as a function of the variable age3. I would like 6 lines on the same plot obtained dividing the samples by group_change.

I think that this should be possible using ggplot2.

like image 345
Donbeo Avatar asked Jan 10 '23 17:01

Donbeo


2 Answers

Here's a ggplot2 approach:

library(ggplot2)
library(dplyr)

# Create fake data
set.seed(10)
xc=data.frame(age3=sample(40:50, 500, replace=TRUE),
           asthma3=sample(0:1,500, replace=TRUE), 
           group_change=sample(0:6, 500, replace=TRUE)) 

# Summarize asthma percent by group_change and age3 (using dplyr)
xc1 = xc %.%
  group_by(group_change, age3) %.%
  summarize(asthma.pct=mean(asthma3)*100)

# Plot using ggplot2
ggplot(xc1, aes(x=age3, y=asthma.pct, colour=as.factor(group_change))) + 
  geom_line() + 
  geom_point() +
  scale_x_continuous(breaks=40:50) +
  xlab("Age") + ylab("Asthma Percent") +
  scale_colour_discrete(name="Group Change")

Here's another ggplot2 approach that works directly with the original data frame and calculates the percentages on the fly. I've also formatted the y-axis in percent format.

library(scales) # Need this for "percent_format()"
ggplot(xc, aes(x=age3, y=asthma3, colour=as.factor(group_change))) + 
  stat_summary(fun.y=mean, geom='line') +
  stat_summary(fun.y=mean, geom='point') +
  scale_x_continuous(breaks=40:50) +
  scale_y_continuous(labels=percent_format()) +
  xlab("Age") + ylab("Asthma Percent") +
  scale_colour_discrete(name="Group Change")
like image 101
eipi10 Avatar answered Jan 13 '23 06:01

eipi10


Here is one way using Stata. The example data has three groups.

The proportions are computed taking the mean of asthma3 which you identify as a binary variable.

clear all
set more off

*----- example data -----

set obs 500
set seed 135

gen age3 = floor((50-40+1)*runiform() + 40)
gen asthma3 = round(runiform())
egen group_change = seq(), to(3)

*----- pretty list -----

order age3 group_change 
sort age3 group_change asthma3
list, sepby(age3)

*----- compute proportions -----

collapse (mean) asthma3, by(age3 group_change)
list

*----- syntax for graph and graph -----

levelsof(group_change), local(gc)

local i = 1
foreach g of local gc {
    local call `call' || connected asthma3 age3 if group_change == `g', sort
    local leg `leg' label(`i++' "Group`g'") // syntax for legend
}

twoway `call' legend(`leg') /// graph
    title("Proportion with asthma by group")

This coincides with one of my first questions in Statalist. In Nick's words, you "build up the syntax" using a local macro and then feed that to twoway.

@NickCox, in a comment, suggests an alternative:

<snip>

*----- compute proportions -----

collapse (mean) asthma3, by(age3 group_change)
list

*----- graph -----

separate asthma3, by(group_change) veryshortlabel

twoway connected asthma31-asthma33 age3, sort ///
    title("Proportion with asthma by group")

<snip>

This second alternative creates new variables from the original asthma3 which I abbreviate in the call to twoway connected as asthma31-asthma33.

Both alternatives produce a legend identifying groups. Labels I leave to you (see help graph).

like image 43
Roberto Ferrer Avatar answered Jan 13 '23 05:01

Roberto Ferrer