How to normalise subgroups from a grouped data frame in R

Tags:

I have a data frame with two numerical variables fatcontent and saltcontent plus two factor variables cond and spice that describe the different treatments. In this data frame each measurement for the numerical varibles was taken twice.

a <- data.frame(cond = rep(c("uncooked", "fried", "steamed", "baked", "grilled"),
                       each = 2, times = 3),
                spice = rep(c("none", "chilli", "basil"), each = 10),
                fatcontent = c(4, 5, 6828, 7530, 6910, 7132, 5885, 613, 2845, 2867,
                               25, 18, 2385, 33227, 4233, 4023, 953, 1025, 4465, 5016,
                               5, 5, 10235, 12545, 5511, 5111, 596, 585, 4012, 3633),
                saltcontent = c(2, 5, 4733, 5500, 5724, 15885, 14885, 217, 193, 148,
                                6, 4, 26738, 24738, 22738, 23738, 267, 256, 1121, 1558,
                                1, 1, 21738, 20738, 26738, 27738, 195, 202, 129, 131)
                )

Now, I wish to nomalise (that means divide in this case) the numerical variables for each spice group by the mean of the uncooked condition.
E.g. for a$spice == "none"

       cond  spice fatcontent saltcontent  
1  uncooked   none          4           2  
2  uncooked   none          5           5  
3     fried   none       6828        4733  
4     fried   none       7530        5500  
5   steamed   none       6910        5724  
6   steamed   none       7132       15885  
7     baked   none       5885       14885  
8     baked   none        613         217  
9   grilled   none       2845         193  
10  grilled   none       2867         148

After normalisation:

       cond spice   fatcontent  saltcontent
1  uncooked  none    0.8888889    0.5714286
2  uncooked  none    1.1111111    1.4285714
3     fried  none 1517.3333333 1352.2857143
4     fried  none 1673.3333333 1571.4285714
5   steamed  none 1535.5555556 1635.4285714
6   steamed  none 1584.8888889 4538.5714286
7     baked  none 1307.7777778 4252.8571429
8     baked  none  136.2222222   62.0000000
9   grilled  none  632.2222222   55.1428571
10  grilled  none  637.1111111   42.2857143

My questions is how can I do this for all the groups and variables in the data frame? I assume I could use the dplyr package but I am not sure what is the best way. I appreciate any help!

742

asked Dec 12 '14 01:12

karnowski

2 Answers

A succinct way to normalize the data would be to include the "uncooked" condition right in the mean calculation so you don't need to filter, summarise, join and recalculate. Doing this with mutate_each means you only need to type it once.

group_by(a, spice) %>%
  mutate_each(funs(./mean(.[cond == "uncooked"])), -cond)

#Source: local data frame [30 x 4]
#Groups: spice
#
#       cond  spice   fatcontent  saltcontent
#1  uncooked   none    0.8888889 5.714286e-01
#2  uncooked   none    1.1111111 1.428571e+00
#3     fried   none 1517.3333333 1.352286e+03
#4     fried   none 1673.3333333 1.571429e+03
#5   steamed   none 1535.5555556 1.635429e+03
#6   steamed   none 1584.8888889 4.538571e+03
#7     baked   none 1307.7777778 4.252857e+03
#8     baked   none  136.2222222 6.200000e+01
#9   grilled   none  632.2222222 5.514286e+01
#10  grilled   none  637.1111111 4.228571e+01
# ... etc

148

answered Sep 20 '22 04:09

talat

I think this is what you are after. You want to find mean for each spice condition using uncooked data points. That is something I have done in my first step. Then, I wanted to add fatmean and saltmean in ana to your data frame, a. If your data is really huge, this may not be a memory efficient way. But, I used left_join to merge ana and a. I, then, did division in mutate for each spice condition. Finally, I dropped two columns for tidying up the results using select.

### Find mean for each spice condition using uncooked data points                
ana <- group_by(filter(a, cond == "uncooked"), spice) %>%
       summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent)) 

 #   spice fatmean saltmean
 #1  basil     5.0      1.0
 #2 chilli    21.5      5.0
 #3   none     4.5      3.5

left_join(a, ana, by = "spice") %>%
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
       saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))

# A part of the results
#       cond spice   fatcontent  saltcontent
#1  uncooked  none    0.8888889    0.5714286
#2  uncooked  none    1.1111111    1.4285714
#3     fried  none 1517.3333333 1352.2857143
#4     fried  none 1673.3333333 1571.4285714
#5   steamed  none 1535.5555556 1635.4285714
#6   steamed  none 1584.8888889 4538.5714286
#7     baked  none 1307.7777778 4252.8571429
#8     baked  none  136.2222222   62.0000000
#9   grilled  none  632.2222222   55.1428571
#10  grilled  none  637.1111111   42.2857143

If you do all things in one piping, it would be something like this:

group_by(filter(a, cond == "uncooked"), spice) %>%
    summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent)) %>%
    left_join(a, ., by = "spice") %>% #right_join is possible with the dev dplyr
    group_by(spice) %>%
    mutate(fatcontent = fatcontent / fatmean,
           saltcontent = saltcontent / saltmean) %>%
    select(-c(fatmean, saltmean))

answered Sep 22 '22 04:09

jazzurro

Related questions
                            
                                How to get `mtext()` with partial bold text?
                            
                                R: interactive plots (tooltips): rCharts dimple plot: formatting axis
                            
                                R - Subtracting two smoothScatter plots
                            
                                Use Predict on data.table with Linear Regression
                            
                                Using compiler- package and suppress "No visible binding for global variable"
                            
                                Rstudio knit to PDF
                            
                                Convert a printed message into a character vector
                            
                                dplyr, do(), extracting parameters from model without losing grouping variable
                            
                                parRF on caret not working for more than one core
                            
                                How to use tryCatch in R
                            
                                Splitting knitr Chunk code and output into two different knitrouts
                            
                                Split column name and convert data from wide to long format in R
                            
                                Plotting large number of time series using ggplot. Is it possible to speed up?
                            
                                rPython using wrong python installation on Mac OSX
                            
                                Using Summary function inside Data.table
                            
                                Making igraph clearer to read
                            
                                Using the same argument names for a function defined inside another function
                            
                                How can dplyr generate data frame for each group after the group_by operation?
                            
                                R install package RevoScaleR
                            
                                ggplot annotate with greek symbol and (1) apostrophe or (2) in between text

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With