I have a matrix-like data frame with an additional column denoting time. It contains information on the number of enrolled students in a given school, from grade 5 (column A
) to grade 9 (column E
).
time A B C D E
1 13 1842 1844 1689 1776 1716
2 14 1898 1785 1807 1617 1679
3 15 2065 1865 1748 1731 1590
4 16 2215 1994 1811 1708 1703
5 17 2174 2122 1903 1765 1699
I need to trace the size of the cohort over time, meaning that I need row-wise information on how many fifth graders from each starting year remained in the school from grades 6 through 9. For example, for the cohort that has begun fifth grade in 2013, I want information on how many remained in sixth grade in 2014, and so on.
Expected output
This is what I would like to end up with:
start.time point.A point.B point.C point.D point.E
1 13 1842 1785 1748 1708 1699
2 14 1898 1865 1811 1765 NA
3 15 2065 1811 1765 NA NA
4 16 2215 1765 NA NA NA
5 17 2174 NA NA NA NA
I have looked at diag()
from base.R
, but I could only get the the data from the main diagonal. Ideally, I'd like to accomplish this using dplyr
syntax and the pipe.
Data
structure(list(time = 13:17, A = c(1842, 1898, 2065, 2215, 2174), B = c(1844, 1785, 1865, 1994, 2122), C = c(1689, 1807, 1748, 1811, 1903), D = c(1776, 1617, 1731, 1708, 1765), E = c(1716, 1679, 1590, 1703, 1699)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), vars = "time", drop = TRUE, indices = list(
0L, 1L, 2L, 3L, 4L), group_sizes = c(1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
time = 13:17), class = "data.frame", row.names = c(NA, -5L), vars = "time", drop = TRUE, .Names = "time"), .Names = c("time", "A", "B", "C", "D", "E"))
Convert the input DF
except for the first column to a matrix mat
. Then since row(mat) - col(mat)
is constant on diagonals split
with respect to that creating a list of ts
class series in L
. We used ts
class since we can later cbind
them even if they are of different lengths. The diagonals for which row(mat) - col(mat) >= 0
are the only ones we want so pick off those, cbind
them together and transpose the result. Then replace all columns in DF
except the first with that. No packages are used.
mat <- as.matrix(DF[-1])
L <- lapply(split(mat, row(mat) - col(mat)), ts)
replace(DF, -1, t(do.call("cbind", L[as.numeric(names(L)) >= 0])))
giving:
time A B C D E
1 13 1842 1785 1748 1708 1699
2 14 1898 1865 1811 1765 NA
3 15 2065 1994 1903 NA NA
4 16 2215 2122 NA NA NA
5 17 2174 NA NA NA NA
Since you mentioned dplyr
in your question, you could use dplyr::lead
to shift the values of columns B
to E
by 1, 2 etc. respectively, and then bind the result with columns time
and A
from your original data as follows
library(tidyverse)
bind_cols(df[, 1:2], map2_df(.x = df[, c(3:ncol(df))],
.y = seq_along(df[, 3:ncol(df)]),
.f = ~dplyr::lead(x = .x, n = .y)))
# A tibble: 5 x 6
# Groups: time [5]
# time A B C D E
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 13 1842 1785 1748 1708 1699
#2 14 1898 1865 1811 1765 NA
#3 15 2065 1994 1903 NA NA
#4 16 2215 2122 NA NA NA
#5 17 2174 NA NA NA NA
Note that your data is grouped by time
the way you provided it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With