I have a data frame with two numerical variables fatcontent and saltcontent plus two factor variables cond and spice that describe the different treatments. In this data frame each measurement for the numerical varibles was taken twice.
a <- data.frame(cond = rep(c("uncooked", "fried", "steamed", "baked", "grilled"),
each = 2, times = 3),
spice = rep(c("none", "chilli", "basil"), each = 10),
fatcontent = c(4, 5, 6828, 7530, 6910, 7132, 5885, 613, 2845, 2867,
25, 18, 2385, 33227, 4233, 4023, 953, 1025, 4465, 5016,
5, 5, 10235, 12545, 5511, 5111, 596, 585, 4012, 3633),
saltcontent = c(2, 5, 4733, 5500, 5724, 15885, 14885, 217, 193, 148,
6, 4, 26738, 24738, 22738, 23738, 267, 256, 1121, 1558,
1, 1, 21738, 20738, 26738, 27738, 195, 202, 129, 131)
)
Now, I wish to nomalise (that means divide in this case) the numerical variables for each spice group by the mean of the uncooked condition.
E.g. for a$spice == "none"
cond spice fatcontent saltcontent
1 uncooked none 4 2
2 uncooked none 5 5
3 fried none 6828 4733
4 fried none 7530 5500
5 steamed none 6910 5724
6 steamed none 7132 15885
7 baked none 5885 14885
8 baked none 613 217
9 grilled none 2845 193
10 grilled none 2867 148
After normalisation:
cond spice fatcontent saltcontent
1 uncooked none 0.8888889 0.5714286
2 uncooked none 1.1111111 1.4285714
3 fried none 1517.3333333 1352.2857143
4 fried none 1673.3333333 1571.4285714
5 steamed none 1535.5555556 1635.4285714
6 steamed none 1584.8888889 4538.5714286
7 baked none 1307.7777778 4252.8571429
8 baked none 136.2222222 62.0000000
9 grilled none 632.2222222 55.1428571
10 grilled none 637.1111111 42.2857143
My questions is how can I do this for all the groups and variables in the data frame? I assume I could use the dplyr package but I am not sure what is the best way. I appreciate any help!
Normalize Data with Min-Max Scaling in R Another efficient way of Normalizing values is through the Min-Max Scaling method. With Min-Max Scaling, we scale the data values between a range of 0 to 1 only. Due to this, the effect of outliers on the data values suppresses to a certain extent.
Two common ways to normalize (or “scale”) variables include: Min-Max Normalization: (X – min(X)) / (max(X) – min(X)) Z-Score Standardization: (X – μ) / σ
In this article, we will discuss how to normalize data in the R programming language. Normalizing Data is the approach to scale the data into a fixed range usually 0 to 1 so that it reduces the scale of the variables.
A succinct way to normalize the data would be to include the "uncooked" condition right in the mean calculation so you don't need to filter, summarise, join and recalculate. Doing this with mutate_each
means you only need to type it once.
group_by(a, spice) %>%
mutate_each(funs(./mean(.[cond == "uncooked"])), -cond)
#Source: local data frame [30 x 4]
#Groups: spice
#
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 5.714286e-01
#2 uncooked none 1.1111111 1.428571e+00
#3 fried none 1517.3333333 1.352286e+03
#4 fried none 1673.3333333 1.571429e+03
#5 steamed none 1535.5555556 1.635429e+03
#6 steamed none 1584.8888889 4.538571e+03
#7 baked none 1307.7777778 4.252857e+03
#8 baked none 136.2222222 6.200000e+01
#9 grilled none 632.2222222 5.514286e+01
#10 grilled none 637.1111111 4.228571e+01
# ... etc
I think this is what you are after. You want to find mean for each spice condition using uncooked data points. That is something I have done in my first step. Then, I wanted to add fatmean
and saltmean
in ana
to your data frame, a
. If your data is really huge, this may not be a memory efficient way. But, I used left_join
to merge ana
and a
. I, then, did division in mutate
for each spice condition. Finally, I dropped two columns for tidying up the results using select
.
### Find mean for each spice condition using uncooked data points
ana <- group_by(filter(a, cond == "uncooked"), spice) %>%
summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent))
# spice fatmean saltmean
#1 basil 5.0 1.0
#2 chilli 21.5 5.0
#3 none 4.5 3.5
left_join(a, ana, by = "spice") %>%
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))
# A part of the results
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 0.5714286
#2 uncooked none 1.1111111 1.4285714
#3 fried none 1517.3333333 1352.2857143
#4 fried none 1673.3333333 1571.4285714
#5 steamed none 1535.5555556 1635.4285714
#6 steamed none 1584.8888889 4538.5714286
#7 baked none 1307.7777778 4252.8571429
#8 baked none 136.2222222 62.0000000
#9 grilled none 632.2222222 55.1428571
#10 grilled none 637.1111111 42.2857143
If you do all things in one piping, it would be something like this:
group_by(filter(a, cond == "uncooked"), spice) %>%
summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent)) %>%
left_join(a, ., by = "spice") %>% #right_join is possible with the dev dplyr
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With