I need to scale a dataframe
.
The process I need to follow is the following:
Divide all elements in a row with the max number in that row, unless that row contains number 1
I use this approach:
post_df <- df # original dataframe
for(i in 1:nrow(df)){
if (! 1 %in% df[i,]) {
post_df[i,] <- df[i,]/max(df[i,])
}
}
I was wondering if there is a faster approach that will cut down some seconds because I run this in a big dataframe 86000 rows *500 cols
.
E.g
Row 1: Divide all elements with 0.7
Row 2: Divide all elements with 0.4
Row 3: Ignore
Row 4: Ignore
Row 5: Ignore
Based on the description, we need to only scale
those rows that doesn't have 1. Create a logical index ('i1') based on rowSums
and then subset the dataset using 'i1', get the max
of each row with pmax
, divide with the subset and assign it back to the subset
i1 <- !rowSums(df==1)>0
df[i1,] <- df[i1,]/do.call(pmax, df[i1,])
set.seed(24)
df <- as.data.frame(matrix(sample(1:8, 10*5, replace = TRUE), ncol=5))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With