Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get element number of list while iterating through it

I have a list of the following structure,

myList <- replicate(5, data.frame(id = 1:10, mean = runif(10)), simplify =F)

and I want to reduce it with a merge

myList %>% reduce(function(x, y) merge(x, y, by = 'id'))

That, however, leads to the following colnames:

    id     mean.x    mean.y    mean.x    mean.y       mean

While I would like something like

 id     mean1    mean2    mean3    mean4       mean5

Where the numbers are based on the order of myList.

Obviously I could iterate over 1:length(myList), but I find this solution unelegant. Other option would be to introduce a check in the reducing function, but that would indue a new linear time search for each element of the list, so I don't believe it to be very efficient.

Is there another way to achieve this?

like image 213
Joaquin Avatar asked Dec 06 '22 13:12

Joaquin


2 Answers

New answer:

Using rbindlist and dcast from the data.table-package:

library(data.table)
mydata <- rbindlist(myList, idcol = 'df')
dcast(mydata, id ~ paste0('mean',df), value.var = 'mean')

Or with the tidyverse packages:

library(dplyr)
library(tidyr)
myList %>% 
  bind_rows(., .id = 'df') %>% 
  spread(df, mean) %>% 
  rename_at(-1, funs(paste0('mean',.)))

which both give (data.table-output is shown):

    id     mean1       mean2     mean3      mean4      mean5
 1:  1 0.6937674 0.005642891 0.4155868 0.74184186 0.54513885
 2:  2 0.3602352 0.569412043 0.8018570 0.29177043 0.34521060
 3:  3 0.6353133 0.512876032 0.8711914 0.44660086 0.16338451
 4:  4 0.2106574 0.555638598 0.8240744 0.37495213 0.57443740
 5:  5 0.9530160 0.059930577 0.0930678 0.39862717 0.91568414
 6:  6 0.3723244 0.598526326 0.4970844 0.01978011 0.07832631
 7:  7 0.2923137 0.712971846 0.3805590 0.25676592 0.11682605
 8:  8 0.6208868 0.426853621 0.5533876 0.64054247 0.78949419
 9:  9 0.9032609 0.274705843 0.3525957 0.46994429 0.32883110
10: 10 0.9707088 0.351394642 0.1089803 0.97969335 0.77791085

When there are duplicates in id in one or more of the dataframes in myList, you have to adapt the dcast-step to dcast(mydata, id + rowid(id,df) ~ paste0('mean',df), value.var = 'mean') to get the correct outcome. Check the following example to see the result:

myList <- replicate(5, data.frame(id = sample(1:10, 10, TRUE), mean = runif(10)), simplify = FALSE)
mydata <- rbindlist(myList, idcol = 'df')
dcast(mydata, id + rowid(id,df) ~ paste0('mean',df), value.var = 'mean')

This also works when there are no duplicates in id.

The tidyverse-code has then to be adapted to:

myList %>% 
  bind_rows(., .id = 'df') %>% 
  group_by(df, id) %>% 
  mutate(ri = row_number()) %>% 
  ungroup() %>% 
  spread(df, mean) %>% 
  rename_at(3:7, funs(paste0('mean',.)))

Old answer (still valid):

A possible solution:

# option 1
myList <- mapply(function(x,y) {names(x)[2] = paste0('mean',y); x}, myList, 1:length(myList), SIMPLIFY = FALSE)
Reduce(function(x, y) merge(x, y, by = 'id'), myList)

# option 2 (quite similar to @zx8754's solution)
mydata <- Reduce(function(x, y) merge(x, y, by = 'id'), myList)
setNames(mydata, c('id', paste0('mean', seq_along(myList))))

which gives:

   id     mean1     mean2      mean3      mean4      mean5
1   1 0.1119114 0.4193226 0.86619590 0.52543072 0.52879193
2   2 0.4630863 0.8786721 0.02012432 0.77274088 0.09227344
3   3 0.9832522 0.4687838 0.49074271 0.01611625 0.69919423
4   4 0.7017467 0.7845002 0.44692958 0.64485570 0.40808345
5   5 0.6204856 0.1687563 0.54407165 0.54236973 0.09947167
6   6 0.1480965 0.7654041 0.43591864 0.22468554 0.84557988
7   7 0.0179509 0.3610114 0.45420122 0.20612154 0.76899342
8   8 0.9862083 0.5579173 0.13540519 0.97311401 0.13947602
9   9 0.3140737 0.2213044 0.05187671 0.07870425 0.23880332
10 10 0.4515313 0.2367271 0.65728768 0.22149073 0.90578043
like image 138
Jaap Avatar answered Jan 16 '23 00:01

Jaap


You can also try to modify the function in the Reduce (or reduce) call to make the adding of indices automatic :

Reduce(function(x, y){

         # get indices of columns that are not the common one, in x and y
         col_noby_x <- which(colnames(x) != "id")
         col_noby_y <- which(colnames(y) != "id")

         # maximum of indices in x (at the end of the column names)
         ind_x <- max(as.numeric(sub(".+(\\d+)$", "\\1", colnames(x)[col_noby_x]))) 

         # if there is no indice yet, put 1 and 2, else modify names only in y, taking the max value of indices in x plus one.
         if(!is.na(ind_x)) colnames(y)[col_noby_y] <- paste0(colnames(y)[col_noby_y], ind_x +1) else {colnames(x)[col_noby_x] <- paste0(colnames(x)[col_noby_x], 1); colnames(y)[col_noby_y] <- paste0(colnames(y)[col_noby_y], 2)}  

         # finally merge
         merge(x, y, by="id")}, myList)      

#   id      mean1     mean2     mean3     mean4      mean5
#1   1 0.10698388 0.0277198 0.5109345 0.8885772 0.79983437
#2   2 0.29750846 0.7951743 0.9558739 0.9691619 0.31805857
#3   3 0.07115142 0.2401011 0.8106464 0.5101563 0.78697618
#4   4 0.39564336 0.7225532 0.7583893 0.4275574 0.77151883
#5   5 0.55860511 0.4111913 0.8403031 0.4284490 0.51489116
#6   6 0.92191777 0.9142926 0.4708712 0.2451099 0.84142501
#7   7 0.08218166 0.2741819 0.6772842 0.7939364 0.86930336
#8   8 0.35392512 0.2088531 0.0801731 0.2734870 0.62963218
#9   9 0.64068537 0.8427225 0.1904426 0.2389339 0.73145206
#10 10 0.31304719 0.9898133 0.8173664 0.2013031 0.04658273
like image 29
Cath Avatar answered Jan 16 '23 00:01

Cath