Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate sequentially over two lists in R

I have two df that look something like this

library(tidyverse)
iris <- iris%>% mutate_at((1:4),~.+2)
iris2 <- iris 
names(iris2)<-sub(".", "_", names(iris2), fixed = TRUE)

My aim is to reduce the values of the variables in iris that are above the maximum values of the corresponding variable in iris2, to match the maximum value in iris2.

I have written a function that does this.

max(iris$Sepal.Length) 
[1] 9.9
max(iris2$Sepal_Length)
[1] 7.9
# i want every value of iris that is >= to max value of iris2 to be equal to the max value of iris 2.

# my function:
fixmax<- function(data,data2,var1,var2) {
  data<- data %>% 
    mutate("{var1}" := ifelse(get(var1)>=max(data2[[var2]],na.rm = T),
                              max(data2[[var2]],na.rm = T),get(var1)))
  return(data)
}

# apply my function to a variable
tst_iris <- fixmax(iris,iris2,"Sepal.Length","Sepal_Length")
max(tst_iris$Sepal.Length)
7.9 # it works!

The challange I face is that I would like to iterate my function sequentially overtwo lists of variables- i.e. Sepal.Length with Sepal_Length, Sepal.Widthwith Sepal_Width etc.

Does anyone knows how I can do this?

I tried using Map but I am doing something wrong.

lst1 <- names(iris[,1:4])
lst2 <- names(iris2[,1:4])
final_iris<- Map(fixmax,iris, iris2,lst1,lst2)

My goal is to obtain a df (final_iris) where every variable has been adjusted using the criteria specified by fixmax. I know I can do this by running my function on every variable like so.

final_iris <- iris
final_iris <- fixmax(final_iris,iris2,"Sepal.Length","Sepal_Length")
final_iris <- fixmax(final_iris,iris2,"Sepal.Width","Sepal_Width")
final_iris <- fixmax(final_iris,iris2,"Petal.Length","Petal_Length")
final_iris <- fixmax(final_iris,iris2,"Petal.Width","Petal_Width")

But in the real data, I have to run this operation tens of times and I would like to be able to loop my function sequentially. Does anyone know how I loop my fixmax over lst1 and lst2 sequentially?

like image 619
Alex Avatar asked Jul 22 '21 15:07

Alex


People also ask

How to iterate over a vector in R?

Loop can be used to iterate over a list, data frame, vector, matrix or any other object. The braces and square bracket are compulsory. R will loop over all the variables in vector and do the computation written inside the exp. Let’s see a few examples. Example 1: We iterate over all the elements of a vector and print the current value.

How do you use a for loop in R list?

For Loop in R with Examples for List and Matrix. A for loop is very valuable when we need to iterate over a list of elements or a range of numbers. Loop can be used to iterate over a list, data frame, vector, matrix or any other object. The braces and square bracket are compulsory.

How to iterate over a matrix with a for loop?

For Loop over a matrix. A matrix has 2-dimension, rows and columns. To iterate over a matrix, we have to define two for loop, namely one for the rows and another for the column. # Create a matrix mat <- matrix (data = seq (10, 20, by=1), nrow = 6, ncol =2) # Create the loop with r and c to iterate over the matrix for (r in 1:nrow (mat)) ...

Is it possible to loop through a list?

However, it would also be possible to loop through a list with a while-loop or a repeat-loop. Have a look at the following video of my YouTube channel. I explain the examples of this tutorial in the video.


Video Answer


2 Answers

Rather than explicitly iterating over the different datasets and columns by name, you can take advantage of the vectorization built into R. If the dataframes have the same column/variable ordering a function mapped to both dataframes using mapply or purrr::map2 will iterate column by column without the need to specify column names.

Given two input data frames (df_small and df_big) the steps are:

  1. Calculate the max of each column in df_small to create df_small_max
  2. Apply the pmin function to each column of df_big and each value of df_small_max using mapply (or purr::map2_dfc if you prefer tidyverse mapping)
#set up fake data
df_small <- iris[,1:4]
df_big <- df_small + 2

# find max of each col in df_small
df_small_max <- sapply(df_small, max)

# replace values of df_big which are larger than df_small_max
df_big_fixed <- mapply(pmin, df_big, df_small_max)




# sanity check -- Note the change in Sepal.Width
df_small_max
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>          7.9          4.4          6.9          2.5
head(df_big, 3)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1          7.1         5.5          3.4         2.2
#> 2          6.9         5.0          3.4         2.2
#> 3          6.7         5.2          3.3         2.2
head(df_big_fixed, 3)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,]          7.1         4.4          3.4         2.2
#> [2,]          6.9         4.4          3.4         2.2
#> [3,]          6.7         4.4          3.3         2.2

Created on 2021-07-31 by the reprex package (v2.0.0)

like image 69
nniloc Avatar answered Oct 20 '22 00:10

nniloc


It's likely that your issue is related to the fact that dataframes are themselves lists. Map() expects the non-function arguments to be lists of the same length. Any arguments that are shorter than the longest list are "recycled" to match it's length.

Currently, you have:

final_iris<- Map(fixmax,iris, iris2,lst1,lst2)

This is actually equivalent to:

final_iris<- Map(fixmax,
                 list(iris$Sepal.Length,
                      iris$Sepal.Width,
                      iris$Petal.Length,
                      iris$Petal.Width,
                      iris$Species),
                 list(iris2$Sepal_Length,
                      iris2$Sepal_Width,
                      iris2$Petal_Length,
                      iris2$Petal_Width,
                      iris2$Species),
                 lst1,
                 lst2)

I suspect that you want iris and iris2 to be supplied to each call to fixmax(). In order to have Map() recycle them like this, they need to be single-element lists. That is you probably want:

final_iris<- Map(fixmax, list(iris), list(iris2),lst1,lst2)

To combined a list of dataframes into a single dataframe do

do.call(rbind, final_iris)
like image 34
Akindele Davies Avatar answered Oct 19 '22 23:10

Akindele Davies