I have a large list of dataframes of the following structure:
foo <- 1:5
lorem1968 <- c(6, NA, NA, 8, NA)
lorem1969 <- c(NA, 17, NA, 19, 20)
df1 <- data.frame(foo, lorem1968, lorem1969)
ipsum <- 11:15
lorem1970 <- c(22, NA, 24, NA, NA)
df2 <- data.frame(ipsum, lorem1969, lorem1970)
df.list <- list(df1, df2)
[[1]]
foo lorem1968 lorem1969
1 1 6 NA
2 2 NA 17
3 3 NA NA
4 4 8 19
5 5 NA 20
[[2]]
ipsum lorem1969 lorem1970
1 11 NA 22
2 12 17 NA
3 13 NA 24
4 14 19 NA
5 15 20 NA
I would like now to iterate over all columns named loremxxxx and replace all NA's there with 0. Then, I would like to create a new column in each df which contains the average of all loremxxxx columns contained in that specific df.
The problem is that these are overlapping panels in the original data, so any df1 contains lorem1968, lorem1969, lorem1970. df2 contains lorem1969, 1970, 1971. And so on.
I tried to select the columns like this:
lorem.cols <- purrr::map(panels.list, function(x)
select(x, starts_with("lorem"))
)
and also:
lorem.cols <- purrr::map(df.list, function(data)
data %>% select(data, starts_with("lorem"))
)
but both threw an error of either not finding the function or of giving me "Selection:" and waiting for input. Just tried to copy from the help page of the select()
function.
After I planned on replacing NAs like so:
df.list <- purrr::map(df.list, function(data)
data %>% mutate(lorem.cols = replace(is.na(lorem.cols), 0))
)
Thanks guys!
We can use base R
. Loop through the list
with lapply
, use grep
to find the index of the column names that match 'lorem' followed by one or more digits, replace
the NA
s in those columns with 0, and transform
the original dataset in the list
to create a new column 'avg' by getting the mean
of those 'lorem' columns
lapply(df.list, function(x) {
i1 <- grep("^lorem\\d+$", names(x))
x[i1] <- replace(x[i1], is.na(x[i1]), 0)
transform(x, avg = rowMeans(x[i1], na.rm = TRUE))
})
#[[1]]
# foo lorem1968 lorem1969 avg
#1 1 6 0 3.0
#2 2 0 17 8.5
#3 3 0 0 0.0
#4 4 8 19 13.5
#5 5 0 20 10.0
#[[2]]
# ipsum lorem1969 lorem1970 avg
#1 11 0 22 11.0
#2 12 17 0 8.5
#3 13 0 24 12.0
#4 14 19 0 9.5
#5 15 20 0 10.0
Here's a data.table approach that relies on data.table
update-by-reference that holds true in lapply()
calls as well.
library(data.table)
lapply(df.list, setDT)
lapply(df.list,
function(dt) {
cols <- grep('^lorem', names(dt))
setnafill(dt, fill = 0L, cols = cols)
dt[, mean_lorem := rowMeans(.SD), .SDcols = cols]
})
#> [[1]]
#> foo lorem1968 lorem1969 mean_lorem
#> 1: 1 6 0 3.0
#> 2: 2 0 17 8.5
#> 3: 3 0 0 0.0
#> 4: 4 8 19 13.5
#> 5: 5 0 20 10.0
#>
#> [[2]]
#> ipsum lorem1969 lorem1970 mean_lorem
#> 1: 11 0 22 11.0
#> 2: 12 17 0 8.5
#> 3: 13 0 24 12.0
#> 4: 14 19 0 9.5
#> 5: 15 20 0 10.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With