I have a couple of files (60) containing tables with identical column names. I write down an exemple of three of them.
what I need is to read from each file (for(i in files) source(i) ?) the data and create a single table out of them where I extract for each ID one single row (with all variables) from each file. The first row starts at a certain date which gets incremented in the next files by one.
For instance, the "starting" row in the first file would be row no. 10 , which start on 10th of Jan. 10 ID1 2000-01-10 20
Now, I jump to the next file, I search for the same ID and subset the row no. 11 (incremented by one day) which represents the dates from the next day, 2000-01-11 Step to the next file, the same ID, subset the row no. 12 (incremented by one compared to the previous file), etc. The number of rows to be subsetted for each ID equals files number. The iteration goes till we reach all rows for ID1 or the last file. Then, we repeat the operations above for the next ID, namely ID2 where the starting row comes from the first file as well and has the same starting date as ID1. It is also row number 10 relative to its ID but 22 in the file: 22 = 10 (ID1 + 2 incremental).
Day = seq(as.Date("2000-01-01"), length = 12, by = "days")
file1:
set.seed(123)
df1 <- data.frame( rowid = c(1:24),
ID = c("ID1", "ID1","ID1", "ID1","ID1", "ID1","ID1", "ID1", "ID1", "ID1","ID1",
"ID1","ID2", "ID2", "ID2", "ID2","ID2", "ID2", "ID2", "ID2","ID2", "ID2", "ID2", "ID2"),
Date = rep(Day, 2),
Var1 = sample(1:30, 24, TRUE) )
file2:
set.seed(123)
df2 <- data.frame( rowid = c(1:24),
ID = c("ID1", "ID1","ID1", "ID1","ID1", "ID1","ID1", "ID1", "ID1", "ID1","ID1", "ID1","ID2", "ID2", "ID2", "ID2","ID2", "ID2", "ID2", "ID2","ID2", "ID2", "ID2", "ID2"),
Date = rep(Day, 2),
Var1 = sample(40:60, 24, TRUE))
file3:
set.seed(123)
df3 <- data.frame(rowid = c(1:24),
ID = c("ID1", "ID1","ID1", "ID1","ID1", "ID1","ID1", "ID1", "ID1", "ID1","ID1", "ID1","ID2", "ID2", "ID2", "ID2","ID2", "ID2", "ID2", "ID2","ID2", "ID2", "ID2", "ID2"),
Date = rep(Day, 2),
Var1 = sample(70:100, 24, TRUE))
The output should look like:
10 ID1 2000-01-10 20
11 ID1 2000-01-11 44
12 ID1 2000-01-12 83
22 ID2 2000-01-10 9
23 ID2 2000-01-11 50
24 ID2 2000-01-12 98
Any thought on how to get it? Thank you
Preparing your input like so:
df_list <- list(df1, df2, df3)
start <- 10
You could do:
library(purrr)
library(dplyr)
imap_dfr(df_list, \(file, index) split(file, ~ ID) %>%
map_dfr(~ slice(.x, start - 1 + index))) %>%
arrange(ID)
Which returns the desired:
rowid ID Date Var1
1 10 ID1 2000-01-10 20
2 11 ID1 2000-01-11 44
3 12 ID1 2000-01-12 83
4 22 ID2 2000-01-10 9
5 23 ID2 2000-01-11 50
6 24 ID2 2000-01-12 98
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With