I have a dataframe similar to this one
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)
Now I would like to scale the values in p1 and p2 depending on their ID. So not the whole column would be scaled like when using the tapply() function, but rather scaling is done once for all values for ID 1, then for all values for ID 2 etc. Same for scaling of p2. The new dataframe should consist of the scaled values.
I already tried
df_scaled <- ddply(my.df, my.df$ID, scale(my.df$p1))
but get the error message
.fun is not a function.
Thanks for your help!
For Scale mode, select Scale based on a metric. This mode provides dynamic scaling. You can also select Scale to a specific instance count. Select + Add a rule. In the Scale rule section on the right, enter values for each setting. Select an aggregation criteria, such as Average.
Frequent cluster scale out and scale in operations are undesirable because of the impact on the cluster's resources and the required time for adding or removing instances, as well as rebalancing the hot cache across all nodes. Predictive logic forecasts the cluster's usage for the next day based on its usage pattern over the last few weeks.
After inspecting the outcomes, we can argue that the RANK function ranks the duplicate values as the same. But it then counts the next values’ rank with latter numbers (considering how many duplicates are present in the range). For example, the function goes from rank 1 to 3 as there is a duplicate rank 1 and the function masks it as rank 2.
The cluster has a static capacity that doesn't change automatically. You select the static capacity by using the Instance count bar. The cluster's scaling remains at that setting until you make another change. Optimized autoscale is the recommended scaling method. This method optimizes cluster performance and cost, as follows:
dplyr
makes this easy:
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)
library(dplyr)
df_scaled <- my.df %>% group_by(ID) %>% mutate(p1 = scale(p1), p2=scale(p2))
Note that there is a bug in the stable version of dplyr
when working with scale; you might need to update to the dev version (see comments).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With