Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr broadcasting single value per group in mutate

Tags:

r

dplyr

I am trying to do something very similar to Scale relative to a value in each group (via dplyr) (however this solution seems to crash R for me). I would like to replicate a single value for each group and add a new column with this value repeated. As an example I have

library(dplyr)

data = expand.grid(
  category = LETTERS[1:2],
  year = 2000:2003)
data$value = runif(nrow(data))

data

  category year     value
1        A 2000 0.6278798
2        B 2000 0.6112281
3        A 2001 0.2170495
4        B 2001 0.6454874
5        A 2002 0.9234604
6        B 2002 0.9311204
7        A 2003 0.5387899
8        B 2003 0.5573527

And I would like a dataframe like

data

  category year     value    value2
1        A 2000 0.6278798 0.6278798
2        B 2000 0.6112281 0.6112281
3        A 2001 0.2170495 0.6278798
4        B 2001 0.6454874 0.6112281
5        A 2002 0.9234604 0.6278798
6        B 2002 0.9311204 0.6112281
7        A 2003 0.5387899 0.6278798
8        B 2003 0.5573527 0.6112281

i.e. the value for each category is the value from year 2000. I was trying to think of a general solution extensible to a given filtering criteria, i.e. something like

data %>% group_by(category) %>% mutate(value = filter(data, year==2002))

however this does not work because of incorrect length in the assignment.

like image 982
mgilbert Avatar asked Dec 03 '15 20:12

mgilbert


People also ask

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What does transmute do in R?

You can use the transmute() function in R to add new calculated variables to a data frame and drop all existing variables. In this example, a new variable called var_new will be created by multiplying an existing variable called var1 by 2.

What is mutate() in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.

What does dplyr stand for?

From dplyr github: The d is for dataframes, the plyr is to evoke pliers.


1 Answers

Do this:

data %>% group_by(category) %>%
  mutate(value2 = value[year == 2000])

You could also do it this way:

data %>% group_by(category) %>%
  arrange(year) %>%
  mutate(value2 = value[1])

or

data %>% group_by(category) %>%
  arrange(year) %>%
  mutate(value2 = first(value))

or

data %>% group_by(category) %>%
  mutate(value2 = nth(value, n = 1, order_by = "year"))

or probably several other ways.

Your attempt with mutate(value = filter(data, year==2002)) doesn't make sense for a few reasons.

  1. When you explicitly pass in data again, it's not part of the chain that got grouped earlier, so it doesn't know about the grouping.

  2. All dplyr verbs take a data frame as first argument and return a data frame, including filter. When you do value = filter(...) you're trying to assign a full data frame to the single column value.

like image 103
Gregor Thomas Avatar answered Sep 26 '22 23:09

Gregor Thomas