scale/normalize columns by group

Tags:

I have a data frame that looks like this:

  Store Temperature Unemployment Sum_Sales
1     1       42.31        8.106   1643691
2     1       38.51        8.106   1641957
3     1       39.93        8.106   1611968
4     1       46.63        8.106   1409728
5     1       46.50        8.106   1554807
6     1       57.79        8.106   1439542

For each 'Store', I want to normalize/scale two columns ("Sum_sales" and "Temperature").

Desired output:

  Store Temperature Unemployment Sum_Sales
1     1       1.000        8.106   1.00000
2     1       0.000        8.106   0.94533
3     1       0.374        8.106   0.00000
4     2       0.012        8.106   0.00000
5     2       0.000        8.106   1.00000
6     2       1.000        8.106   0.20550

Here is the normalizing function that I created:

 normalit<-function(m){
   (m - min(m))/(max(m)-min(m))
 }

What I have tried:

df2 <- df %.%
  group_by('Store') %.%
  summarise(Temperature = normalit(Temperature), Sum_Sales = normalit(Sum_Sales)))

Any suggestions/help would be greatly appreciated. Thanks.

851

asked Nov 15 '14 19:11

itjcms18

2 Answers

The issue is that you are using the wrong dplyr verb. Summarize will create one result per group per variable. What you want is mutate. Mutate changes variables and returns a result of the same length as the original. See http://cran.rstudio.com/web/packages/dplyr/vignettes/dplyr.html. Below two approaches using dplyr:

df %>%
    group_by(Store) %>%
    mutate(Temperature = normalit(Temperature), Sum_Sales = normalit(Sum_Sales))

df %>%
    group_by(Store) %>%
    mutate_each(funs(normalit), Temperature, Sum_Sales)

Note: The Store variable is different between your data and desired result. I assumed that @jlhoward got the right data.

177

answered Oct 18 '22 20:10

Vincent

Here's a data.table solution. I changed your example a bit to have two type of store.

df <- read.table(header=T,text="Store Temperature Unemployment Sum_Sales
1     1       42.31        8.106   1643691
2     1       38.51        8.106   1641957
3     1       39.93        8.106   1611968
4     2       46.63        8.106   1409728
5     2       46.50        8.106   1554807
6     2       57.79        8.106   1439542")

library(data.table)
DT <- as.data.table(df)
DT[,list(Temperature=normalit(Temperature),Sum_Sales=normalit(Sum_Sales)),
    by=list(Store,Unemployment)]
#    Store Unemployment Temperature Sum_Sales
# 1:     1        8.106  1.00000000 1.0000000
# 2:     1        8.106  0.00000000 0.9453393
# 3:     1        8.106  0.37368421 0.0000000
# 4:     2        8.106  0.01151461 0.0000000
# 5:     2        8.106  0.00000000 1.0000000
# 6:     2        8.106  1.00000000 0.2055018

Note that your normalization will have problems if there is only 1 row for a stoer.

answered Oct 18 '22 21:10

jlhoward

Related questions
                            
                                Kill all R processes that hang for longer than a minute
                            
                                Disable Auto completion in R studio
                            
                                UseMethod("predict") : no applicable method for 'predict' applied to an object of class "train"
                            
                                Use filter() (and other dplyr functions) inside nested data frames with map()
                            
                                Fill arrow on geom_curve ggplot2
                            
                                writing an object to disk in R through C++ vs. fst
                            
                                R Plot Color Combinations that Are Colorblind Accessible
                            
                                What is the difference between assign() and <<- in R?
                            
                                Can I get boxplot notches in ggplot2?
                            
                                Round to nearest arbitrary number from list
                            
                                safely turn a data.table back into a data.frame
                            
                                How to move columns of a data frame into rows after the first few columns?
                            
                                Using R parallel to speed up bootstrap
                            
                                centre title in PDF converted from markdown using Pandoc
                            
                                How do I include a superscript to texts on a plot on R?
                            
                                R memory usage of each variable [duplicate]
                            
                                How can I round a date to the quarter start/end?
                            
                                NA matches NA, but is not equal to NA. Why?
                            
                                What is the difference between %>% and %,% in magrittr?
                            
                                How to read the nth line of a Parsed html in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scale/normalize columns by group

Tags:

r

scale

dplyr

plyr

itjcms18

People also ask

2 Answers

Vincent

jlhoward

Recent Activity

Donate For Us