Replacing NA of numbered column in list of data frames

Question

I have a large list of dataframes of the following structure:

foo <- 1:5
lorem1968 <- c(6, NA, NA, 8, NA)
lorem1969 <- c(NA, 17, NA, 19, 20)
df1 <- data.frame(foo, lorem1968, lorem1969)

ipsum <- 11:15
lorem1970 <- c(22, NA, 24, NA, NA)
df2 <- data.frame(ipsum, lorem1969, lorem1970)

df.list <- list(df1, df2)

[[1]]
  foo lorem1968 lorem1969
1   1         6        NA
2   2        NA        17
3   3        NA        NA
4   4         8        19
5   5        NA        20

[[2]]
  ipsum lorem1969 lorem1970
1    11        NA        22
2    12        17        NA
3    13        NA        24
4    14        19        NA
5    15        20        NA

I would like now to iterate over all columns named loremxxxx and replace all NA's there with 0. Then, I would like to create a new column in each df which contains the average of all loremxxxx columns contained in that specific df.

The problem is that these are overlapping panels in the original data, so any df1 contains lorem1968, lorem1969, lorem1970. df2 contains lorem1969, 1970, 1971. And so on.

I tried to select the columns like this:

lorem.cols <- purrr::map(panels.list, function(x)
  select(x, starts_with("lorem"))
  )

and also:

lorem.cols <- purrr::map(df.list, function(data)
  data %>% select(data, starts_with("lorem"))
)

but both threw an error of either not finding the function or of giving me "Selection:" and waiting for input. Just tried to copy from the help page of the select() function.

After I planned on replacing NAs like so:

df.list <- purrr::map(df.list, function(data)
  data %>% mutate(lorem.cols = replace(is.na(lorem.cols), 0))
  )

Thanks guys!

akrun · Accepted Answer

We can use base R. Loop through the list with lapply, use grep to find the index of the column names that match 'lorem' followed by one or more digits, replace the NAs in those columns with 0, and transform the original dataset in the list to create a new column 'avg' by getting the mean of those 'lorem' columns

lapply(df.list, function(x) {
         i1 <- grep("^lorem\d+$", names(x))
         x[i1] <- replace(x[i1], is.na(x[i1]), 0)
    transform(x, avg = rowMeans(x[i1], na.rm = TRUE))
   })
#[[1]]
#  foo lorem1968 lorem1969  avg
#1   1         6         0  3.0
#2   2         0        17  8.5
#3   3         0         0  0.0
#4   4         8        19 13.5
#5   5         0        20 10.0

#[[2]]
#  ipsum lorem1969 lorem1970  avg
#1    11         0        22 11.0
#2    12        17         0  8.5
#3    13         0        24 12.0
#4    14        19         0  9.5
#5    15        20         0 10.0

Cole · Answer

Here's a data.table approach that relies on data.table update-by-reference that holds true in lapply() calls as well.

library(data.table)
lapply(df.list, setDT)

lapply(df.list,
       function(dt) {
         cols <- grep('^lorem', names(dt))
         setnafill(dt, fill = 0L, cols = cols)
         dt[, mean_lorem := rowMeans(.SD), .SDcols = cols]
         })
#> [[1]]
#>    foo lorem1968 lorem1969 mean_lorem
#> 1:   1         6         0        3.0
#> 2:   2         0        17        8.5
#> 3:   3         0         0        0.0
#> 4:   4         8        19       13.5
#> 5:   5         0        20       10.0
#> 
#> [[2]]
#>    ipsum lorem1969 lorem1970 mean_lorem
#> 1:    11         0        22       11.0
#> 2:    12        17         0        8.5
#> 3:    13         0        24       12.0
#> 4:    14        19         0        9.5
#> 5:    15        20         0       10.0

Replacing NA of numbered column in list of data frames

Tags:

replace

list

r

na

purrr

schroederadrian

2 Answers

akrun

Cole

Recent Activity

Donate For Us

Replacing NA of numbered column in list of data frames

Tags:

replace

list

r

na

purrr

schroederadrian

2 Answers

akrun

Cole

Related questions

Recent Activity

Donate For Us