Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turn dataframe with a row for each id and law (with begin and end years) into a file with a row for each id and year

Tags:

dataframe

r

I have a df called laws with a row for each law (one for each id):

laws <- data.frame(id=c(1,2,3),beginyear=c(2001,2002,2005),endyear=c(2003,2005,2006), law1=c(0,0,1), law2=c(1,0,1))

from which I want to create second called idyear with a row for each id and year:

idyear <- data.frame(id=c(rep(1,6),rep(2,6),rep(3,6)), year=(rep(c(2001:2006),3)), law1=c(rep(0,16),1,1), law2=c(1,1,1,rep(0,13),1,1))

How would I efficiently go about writing some code to get the idyear df output from the laws df? The two law variables are indicator variables == 1 if the idyear$year is >= laws$beginyear AND idyear$year is <= laws$endyear.

I am a beginner with R, but I'm willing to try anything (apply, loops, etc.) to get this to work.

like image 459
Oren Rosenberg Avatar asked Dec 13 '22 19:12

Oren Rosenberg


2 Answers

1) base expand.grid will create an 18 x 2 data frame of all id and year combinations and then merge will merge it back together with laws. Zero out any law1 and law2 entry for which year is not between beginyear and endyear. Finally drop the beginyear and endyear columns. No packages are used.

g <- with(laws, expand.grid(year = min(beginyear):max(endyear), id = id))
m <- merge(g, laws)
m[m$year < m$beginyear | m$year > m$endyear, c("law1", "law2")] <- 0
m <- subset(m, select = - c(beginyear, endyear))

# check
identical(m, idyear)
## [1] TRUE

2) magrittr This is the same solution as (1) except we have used magrittr pipelines to express it. Note the mixture of pipe operators.

library(magrittr)

laws %$%
     expand.grid(year = min(beginyear):max(endyear), id = id) %>%
     merge(laws) %$%
     { .[year < beginyear | year > endyear, c("law1", "law2")] <- 0; .} %>%
     subset(select = - c(beginyear, endyear))

Update: Fixed. Added (2).

like image 188
G. Grothendieck Avatar answered Dec 18 '22 00:12

G. Grothendieck


A solution using tidyverse. The last as.data.frame() is optional, which just convert the tbl to a data frame.

library(tidyverse)

idyear <- laws %>%
  mutate(year = map2(beginyear, endyear, `:`)) %>%
  unnest() %>%
  complete(id, year = full_seq(year, period = 1L), fill = list(law1 = 0L, law2 = 0L)) %>%
  select(-beginyear, -endyear) %>%
  as.data.frame()
idyear
#    id year law1 law2
# 1   1 2001    0    1
# 2   1 2002    0    1
# 3   1 2003    0    1
# 4   1 2004    0    0
# 5   1 2005    0    0
# 6   1 2006    0    0
# 7   2 2001    0    0
# 8   2 2002    0    0
# 9   2 2003    0    0
# 10  2 2004    0    0
# 11  2 2005    0    0
# 12  2 2006    0    0
# 13  3 2001    0    0
# 14  3 2002    0    0
# 15  3 2003    0    0
# 16  3 2004    0    0
# 17  3 2005    1    1
# 18  3 2006    1    1
like image 38
www Avatar answered Dec 17 '22 23:12

www