I am trying to apply the Winsorize() function using lapply from the library(DescTools) package. What I currently have is;
data$col1 <- Winsorize(data$col1)
Which essentially replaces the extreme values with a value based on quantiles, replacing the below data as follows;
> data$col1
 [1]   -0.06775798   **-0.55213508**   -0.12338265
 [4]    0.04928349    **0.47524313**    0.04782829
 [7]   -0.05070639 **-112.67126382**    0.12657896
[10]   -0.12886632
> Winsorize(data$col1)
 [1] -0.06775798 **-0.37884540** -0.12338265  0.04928349
 [5]  **0.26038103**  0.04782829 -0.05070639 **-0.37884540**
 [9]  0.12657896 -0.12886632
I have a for loop which can do this across all columns of the data.frame col1, col2, col3, col4, however, I know lapply is a better option so I am trying to incorporate it into an lapply function but cannot seem to get it working. If anybody can point me in the right direction it would be much apreciated.
The data;
data <- structure(list(EQ.TA = c(-0.0677579847115102, -0.552135083517749, 
-0.123382654164705, 0.0492834931482554, 0.475243125304193, 0.0478282913638668, 
-0.050706389027946, -112.671263815473, 0.126578956975704, -0.128866322940619
), NI.EQ = c(3.64670235329765, 1.66115713369585, 0.209424623633739, 
0.340430636358184, -0.248411254566261, -12.1709277350516, 1.06888235737433, 
0.0515582237132515, 0.177323118521857, 0.419879195374698), NI.TA = c(-0.24709320230217, 
-0.917183132749265, -0.0258393659113752, 0.0167776109344148, 
-0.118055740980805, -0.582114677880617, -0.0541991646381309, 
-5.80913022585296, 0.0224453753901758, -0.0541082879872031), 
    TL.TA = c(1.06775798471151, 1.55213508351775, 1.12338265416471, 
    0.950716506851745, 0.524756874695807, 0.952171708636133, 
    1.05070638902795, 113.671263815473, 0.873421043024296, 1.12886632294062
    )), .Names = c("EQ.TA", "NI.EQ", "NI.TA", "TL.TA"), row.names = c(NA, 
10L), class = "data.frame")
You can lapply over the whole data.frame and reassign it like:
library(DescTools)
data[]<-lapply(data, Winsorize)
data
#          EQ.TA       NI.EQ       NI.TA      TL.TA
#1   -0.06775798  2.75320700 -0.24709320  1.0677580
#2   -0.55213508  1.66115713 -0.91718313  1.5521351
#3   -0.12338265  0.20942462 -0.02583937  1.1233827
#4    0.04928349  0.34043064  0.01677761  0.9507165
#5    0.31834425 -0.24841125 -0.11805574  0.6816558
#6    0.04782829 -6.80579532 -0.58211468  0.9521717
#7   -0.05070639  1.06888236 -0.05419916  1.0507064
#8  -62.21765589  0.05155822 -3.60775403 63.2176559
#9    0.12657896  0.17732312  0.01989488  0.8734210
#10  -0.12886632  0.41987920 -0.05410829  1.1288663
I like the answers above. But for a recent research project I had a data frame with variables of different types. I only want to winsorize numeric variables at the 1%-level using lapply keeping NA values. Extending the answer above I think the following might be a suitable extension:
library(DescTools)
wins_vars <- function(x, pct_level = 0.01){
    if(is.numeric(x)){
      Winsorize(x, probs = c(pct_level, 1-pct_level), na.rm = T)
      } else {x}
    }
df <- bind_cols(
  lapply(df, wins_vars))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With