I have two df that look something like this
library(tidyverse)
iris <- iris%>% mutate_at((1:4),~.+2)
iris2 <- iris
names(iris2)<-sub(".", "_", names(iris2), fixed = TRUE)
My aim is to reduce the values of the variables in iris that are above the maximum values of the corresponding variable in iris2, to match the maximum value in iris2.
I have written a function that does this.
max(iris$Sepal.Length)
[1] 9.9
max(iris2$Sepal_Length)
[1] 7.9
# i want every value of iris that is >= to max value of iris2 to be equal to the max value of iris 2.
# my function:
fixmax<- function(data,data2,var1,var2) {
data<- data %>%
mutate("{var1}" := ifelse(get(var1)>=max(data2[[var2]],na.rm = T),
max(data2[[var2]],na.rm = T),get(var1)))
return(data)
}
# apply my function to a variable
tst_iris <- fixmax(iris,iris2,"Sepal.Length","Sepal_Length")
max(tst_iris$Sepal.Length)
7.9 # it works!
The challange I face is that I would like to iterate my function sequentially overtwo lists of variables- i.e. Sepal.Length with Sepal_Length, Sepal.Widthwith Sepal_Width etc.
Does anyone knows how I can do this?
I tried using Map but I am doing something wrong.
lst1 <- names(iris[,1:4])
lst2 <- names(iris2[,1:4])
final_iris<- Map(fixmax,iris, iris2,lst1,lst2)
My goal is to obtain a df (final_iris) where every variable has been adjusted using the criteria specified by fixmax.
I know I can do this by running my function on every variable like so.
final_iris <- iris
final_iris <- fixmax(final_iris,iris2,"Sepal.Length","Sepal_Length")
final_iris <- fixmax(final_iris,iris2,"Sepal.Width","Sepal_Width")
final_iris <- fixmax(final_iris,iris2,"Petal.Length","Petal_Length")
final_iris <- fixmax(final_iris,iris2,"Petal.Width","Petal_Width")
But in the real data, I have to run this operation tens of times and I would like to be able to loop my function sequentially.
Does anyone know how I loop my fixmax over lst1 and lst2 sequentially?
Loop can be used to iterate over a list, data frame, vector, matrix or any other object. The braces and square bracket are compulsory. R will loop over all the variables in vector and do the computation written inside the exp. Let’s see a few examples. Example 1: We iterate over all the elements of a vector and print the current value.
For Loop in R with Examples for List and Matrix. A for loop is very valuable when we need to iterate over a list of elements or a range of numbers. Loop can be used to iterate over a list, data frame, vector, matrix or any other object. The braces and square bracket are compulsory.
For Loop over a matrix. A matrix has 2-dimension, rows and columns. To iterate over a matrix, we have to define two for loop, namely one for the rows and another for the column. # Create a matrix mat <- matrix (data = seq (10, 20, by=1), nrow = 6, ncol =2) # Create the loop with r and c to iterate over the matrix for (r in 1:nrow (mat)) ...
However, it would also be possible to loop through a list with a while-loop or a repeat-loop. Have a look at the following video of my YouTube channel. I explain the examples of this tutorial in the video.
Rather than explicitly iterating over the different datasets and columns by name, you can take advantage of the vectorization built into R. If the dataframes have the same column/variable ordering a function mapped to both dataframes using mapply or purrr::map2 will iterate column by column without the need to specify column names.
Given two input data frames (df_small and df_big) the steps are:
df_small to create df_small_max
pmin function to each column of df_big and each value of df_small_max using mapply (or purr::map2_dfc if you prefer tidyverse mapping)#set up fake data
df_small <- iris[,1:4]
df_big <- df_small + 2
# find max of each col in df_small
df_small_max <- sapply(df_small, max)
# replace values of df_big which are larger than df_small_max
df_big_fixed <- mapply(pmin, df_big, df_small_max)
# sanity check -- Note the change in Sepal.Width
df_small_max
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 7.9 4.4 6.9 2.5
head(df_big, 3)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 7.1 5.5 3.4 2.2
#> 2 6.9 5.0 3.4 2.2
#> 3 6.7 5.2 3.3 2.2
head(df_big_fixed, 3)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] 7.1 4.4 3.4 2.2
#> [2,] 6.9 4.4 3.4 2.2
#> [3,] 6.7 4.4 3.3 2.2
Created on 2021-07-31 by the reprex package (v2.0.0)
It's likely that your issue is related to the fact that dataframes are themselves lists. Map() expects the non-function arguments to be lists of the same length. Any arguments that are shorter than the longest list are "recycled" to match it's length.
Currently, you have:
final_iris<- Map(fixmax,iris, iris2,lst1,lst2)
This is actually equivalent to:
final_iris<- Map(fixmax,
list(iris$Sepal.Length,
iris$Sepal.Width,
iris$Petal.Length,
iris$Petal.Width,
iris$Species),
list(iris2$Sepal_Length,
iris2$Sepal_Width,
iris2$Petal_Length,
iris2$Petal_Width,
iris2$Species),
lst1,
lst2)
I suspect that you want iris and iris2 to be supplied to each call to fixmax(). In order to have Map() recycle them like this, they need to be single-element lists. That is you probably want:
final_iris<- Map(fixmax, list(iris), list(iris2),lst1,lst2)
To combined a list of dataframes into a single dataframe do
do.call(rbind, final_iris)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With