Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing NA of numbered column in list of data frames

I have a large list of dataframes of the following structure:

foo <- 1:5
lorem1968 <- c(6, NA, NA, 8, NA)
lorem1969 <- c(NA, 17, NA, 19, 20)
df1 <- data.frame(foo, lorem1968, lorem1969)

ipsum <- 11:15
lorem1970 <- c(22, NA, 24, NA, NA)
df2 <- data.frame(ipsum, lorem1969, lorem1970)

df.list <- list(df1, df2)

[[1]]
  foo lorem1968 lorem1969
1   1         6        NA
2   2        NA        17
3   3        NA        NA
4   4         8        19
5   5        NA        20

[[2]]
  ipsum lorem1969 lorem1970
1    11        NA        22
2    12        17        NA
3    13        NA        24
4    14        19        NA
5    15        20        NA

I would like now to iterate over all columns named loremxxxx and replace all NA's there with 0. Then, I would like to create a new column in each df which contains the average of all loremxxxx columns contained in that specific df.

The problem is that these are overlapping panels in the original data, so any df1 contains lorem1968, lorem1969, lorem1970. df2 contains lorem1969, 1970, 1971. And so on.

I tried to select the columns like this:

lorem.cols <- purrr::map(panels.list, function(x)
  select(x, starts_with("lorem"))
  )

and also:

lorem.cols <- purrr::map(df.list, function(data)
  data %>% select(data, starts_with("lorem"))
)

but both threw an error of either not finding the function or of giving me "Selection:" and waiting for input. Just tried to copy from the help page of the select() function.

After I planned on replacing NAs like so:

df.list <- purrr::map(df.list, function(data)
  data %>% mutate(lorem.cols = replace(is.na(lorem.cols), 0))
  )

Thanks guys!

like image 356
schroederadrian Avatar asked Jan 26 '23 15:01

schroederadrian


2 Answers

We can use base R. Loop through the list with lapply, use grep to find the index of the column names that match 'lorem' followed by one or more digits, replace the NAs in those columns with 0, and transform the original dataset in the list to create a new column 'avg' by getting the mean of those 'lorem' columns

lapply(df.list, function(x) {
         i1 <- grep("^lorem\\d+$", names(x))
         x[i1] <- replace(x[i1], is.na(x[i1]), 0)
    transform(x, avg = rowMeans(x[i1], na.rm = TRUE))
   })
#[[1]]
#  foo lorem1968 lorem1969  avg
#1   1         6         0  3.0
#2   2         0        17  8.5
#3   3         0         0  0.0
#4   4         8        19 13.5
#5   5         0        20 10.0

#[[2]]
#  ipsum lorem1969 lorem1970  avg
#1    11         0        22 11.0
#2    12        17         0  8.5
#3    13         0        24 12.0
#4    14        19         0  9.5
#5    15        20         0 10.0
like image 156
akrun Avatar answered Feb 12 '23 19:02

akrun


Here's a data.table approach that relies on data.table update-by-reference that holds true in lapply() calls as well.

library(data.table)
lapply(df.list, setDT)

lapply(df.list,
       function(dt) {
         cols <- grep('^lorem', names(dt))
         setnafill(dt, fill = 0L, cols = cols)
         dt[, mean_lorem := rowMeans(.SD), .SDcols = cols]
         })
#> [[1]]
#>    foo lorem1968 lorem1969 mean_lorem
#> 1:   1         6         0        3.0
#> 2:   2         0        17        8.5
#> 3:   3         0         0        0.0
#> 4:   4         8        19       13.5
#> 5:   5         0        20       10.0
#> 
#> [[2]]
#>    ipsum lorem1969 lorem1970 mean_lorem
#> 1:    11         0        22       11.0
#> 2:    12        17         0        8.5
#> 3:    13         0        24       12.0
#> 4:    14        19         0        9.5
#> 5:    15        20         0       10.0
like image 31
Cole Avatar answered Feb 12 '23 20:02

Cole