I need to scale a dataframe.
The process I need to follow is the following:
Divide all elements in a row with the max number in that row, unless that row contains number 1
I use this approach:
post_df <- df # original dataframe
for(i in 1:nrow(df)){
if (! 1 %in% df[i,]) {
post_df[i,] <- df[i,]/max(df[i,])
}
}
I was wondering if there is a faster approach that will cut down some seconds because I run this in a big dataframe 86000 rows *500 cols .
E.g
Row 1: Divide all elements with 0.7
Row 2: Divide all elements with 0.4
Row 3: Ignore
Row 4: Ignore
Row 5: Ignore
Based on the description, we need to only scale those rows that doesn't have 1. Create a logical index ('i1') based on rowSums and then subset the dataset using 'i1', get the max of each row with pmax, divide with the subset and assign it back to the subset
i1 <- !rowSums(df==1)>0
df[i1,] <- df[i1,]/do.call(pmax, df[i1,])
set.seed(24)
df <- as.data.frame(matrix(sample(1:8, 10*5, replace = TRUE), ncol=5))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With