I have ~ 4 million rows of personal data that looks like the following:
names <- c("Peter", "Peter", "Peter", "Peter", "Peter", "Peter", "Peter", "Lisa", "Bert", "Carine", "Carine", "Carine", "Carine", "Carine", "Carine")
luckyToday <- c(0,0,0,NA,0,0,1,NA,1,NA,0,0,0,1,1)
luckyYesterday <- NA_real_
df1 <- data.frame(names,luckyToday,luckyYesterday)
df1
# names luckyToday luckyYesterday
# 1 Peter 0 NA
# 2 Peter 0 NA
# 3 Peter 0 NA
# 4 Peter NA NA
# 5 Peter 0 NA
# 6 Peter 0 NA
# 7 Peter 1 NA
# 8 Lisa NA NA
# 9 Bert 1 NA
# 10 Carine NA NA
# 11 Carine 0 NA
# 12 Carine 0 NA
# 13 Carine 0 NA
# 14 Carine 1 NA
# 15 Carine 1 NA
The data contains observations of people (some with 1 observation, some with more) and their luckiness (1=lucky, 0=unlucky, NA=no information). As kind of a lagged variable, I want to introduce a new variable ("luckyYesterday") that tells me if the person was lucky during the last observation or not. So I want the data look like this:
df2
# names luckyToday luckyYesterday
# 1 Peter 0 NA
# 2 Peter 0 0
# 3 Peter 0 0
# 4 Peter NA 0
# 5 Peter 0 0
# 6 Peter 0 0
# 7 Peter 1 0
# 8 Lisa NA NA
# 9 Bert 1 NA
# 10 Carine NA NA
# 11 Carine 0 0
# 12 Carine 0 0
# 13 Carine 0 0
# 14 Carine 1 0
# 15 Carine 1 1
I know that R is not the perfect programm to apply such data wrangling, but it is necessary.
I want to consider the following things:
I tried it by myself with 2 for-loops, but I takes ages on my data with over 4 million observations. Can anyone help my with a faster solution such as with data.table or an apply function, please? I would appreciate that so much!
Cheers
You can use the shift function from data.table to observe yesterday and na.locf function from zoo package to fill NA with yesterday or tomorrow depending on if the fromLast parameter is F or T, and also group by the name if you don't want to mix observations of different people:
library(data.table); library(zoo)
setDT(df1)[,luckyYesterday := shift(na.locf(luckyToday, fromLast = TRUE)), names]
df1
# names luckyToday luckyYesterday
# 1: Peter 0 NA
# 2: Peter 0 0
# 3: Peter 0 0
# 4: Peter NA 0
# 5: Peter 0 0
# 6: Peter 0 0
# 7: Peter 1 0
# 8: Lisa NA NA
# 9: Bert 1 NA
# 10: Carine NA NA
# 11: Carine 0 0
# 12: Carine 0 0
# 13: Carine 0 0
# 14: Carine 1 0
# 15: Carine 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With