I'm developing a model that forecasts completed fertility for an age cohort. I currently have a data frame like this, where the rows are ages and the columns are years. The value in each cell is age-specific fertility for that year:
> df1
iso3 sex age fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 0.00000
15 AUS female 14 0.000 0.00000 0.00000
16 AUS female 15 13.108 13.42733 13.74667
17 AUS female 16 26.216 26.85467 27.49333
18 AUS female 17 39.324 40.28200 41.24000
However, what I want is each row to be a cohort. Because the rows and columns represent individual years, the cohort data can be obtained by getting the diagonal. I'm looking for a result like this:
> df2
iso3 sex ageIn1953 fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 13.74667
15 AUS female 14 0.000 13.42733 27.49333
16 AUS female 15 13.108 26.85467 41.24000
17 AUS female 16 26.216 40.28200 [data..]
18 AUS female 17 39.324 [data..] [data..]
Here's the df1
data frame:
df1 <- structure(list(iso3 = c("AUS", "AUS", "AUS", "AUS", "AUS"), sex = c("female",
"female", "female", "female", "female"), age = c(13, 14, 15,
16, 17), fert1953 = c(0, 0, 13.108, 26.216, 39.324), fert1954 = c(0,
0, 13.4273333333333, 26.8546666666667, 40.282), fert1955 = c(0,
0, 13.7466666666667, 27.4933333333333, 41.24)), .Names = c("iso3",
"sex", "age", "fert1953", "fert1954", "fert1955"), class = "data.frame", row.names = 14:18)
EDIT:
Here's the solution I ultimately used. It's based on David's answer, but I needed to do this for each level of iso3
.
df.ls <- lapply(split(f3, f = f3$iso3), FUN = function(df1) {
n <- ncol(df1) - 4
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
return(cbind(df1[seq_len(4)], temp))
})
f4 <- do.call("rbind", df.ls)
I haven't tested for speed, but data.table
v1.9.5, recently implemented a new (written in C) lead/lag function called shift
So for the columns you want to shift, you could potentially use it combined with mapply
, for example
library(data.table)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) shift(x, n = y, type = "lead"), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp) # combining back with the unchanged columns
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
Edit: You can easily install the development version of data.table
from GitHub using
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
Either way, if you want dplyr
, here goes
library(dplyr)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp)
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
Here is a base R approach:
df1[,5:ncol(df1)] <- mapply(function(x, y) {vec.list <- df1[-1:-y, x]
length(vec.list) <- nrow(df1)
vec.list},
x=5:ncol(df1), y=1:(ncol(df1)-4))
df1
# iso3 sex age fert1953 fert1954 fert1955
#14 AUS female 13 0.000 0.00000 13.74667
#15 AUS female 14 0.000 13.42733 27.49333
#16 AUS female 15 13.108 26.85467 41.24000
#17 AUS female 16 26.216 40.28200 NA
#18 AUS female 17 39.324 NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With