Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R How to mutate a subset of rows

I am having trouble mutating a subset of rows in dplyr. I am using the chaining command: %>% to say:

data <- data %>%
  filter(ColA == "ABC") %>%
  mutate(ColB = "XXXX")

This works fine but the problems is that I want to be able to select the entire original table and see the mutate applied to only the subset of data I had specified. My problem is that when I view data after this I only see the subset of data and its updated ColB information.

I would also like to know how to do this using data.table.

Thanks.

like image 881
mo_maat Avatar asked Apr 23 '15 23:04

mo_maat


People also ask

Can you subset rows in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.

What does slice () do in R?

slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases: slice_head() and slice_tail() select the first or last rows.


3 Answers

When you use filter() you are actually removing the rows that do not match the condition you specified, so they will not show up in the final data set.

Does ColB already exist in your data frame? If so,

data %>%
  mutate(ColB = ifelse(ColA == "ABC", "XXXX", ColB))

will change ColB to "XXXX" when ColA == "ABC" and leave it as is otherwise. If ColB does not already exist, then you will have to specify what to do for rows where ColA != "ABC", for example:

data %>%
  mutate(ColB = ifelse(ColA == "ABC", "XXXX", NA))
like image 200
Kara Woo Avatar answered Oct 22 '22 17:10

Kara Woo


Using data.table, we'd do:

setDT(data)[colA == "ABC", ColB := "XXXX"]

and the values are modified in-place, unlike if-else, which'd copy the entire column to replace just those rows where the condition satisfies.

We call this sub-assign by reference. You can read more about it in the new HTML vignettes.

like image 25
Arun Avatar answered Oct 22 '22 19:10

Arun


Another option is to perform a subsequent combination of union and anti-join with the same data. This requires a primary key:

data <- data %>%
  filter(ColA == "ABC") %>%
  mutate(ColB = "XXXX") %>%
  rbind_list(., anti_join(data, ., by = ...))

Example:

mtcars_n <- mtcars %>% add_rownames
mtcars_n %>%
  filter(cyl > 6) %>%
  mutate(mpg = 1) %>%
  rbind_list(., anti_join(mtcars_n, ., by = "rowname"))

This is much slower than probably any other approach, but useful to get quick results by extending your existing pipe.

like image 30
krlmlr Avatar answered Oct 22 '22 19:10

krlmlr