I am having trouble mutating a subset of rows in dplyr
. I am using the chaining command: %>%
to say:
data <- data %>%
filter(ColA == "ABC") %>%
mutate(ColB = "XXXX")
This works fine but the problems is that I want to be able to select the entire original table and see the mutate applied to only the subset of data I had specified. My problem is that when I view data after this I only see the subset of data
and its updated ColB
information.
I would also like to know how to do this using data.table
.
Thanks.
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.
slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases: slice_head() and slice_tail() select the first or last rows.
When you use filter()
you are actually removing the rows that do not match the condition you specified, so they will not show up in the final data set.
Does ColB
already exist in your data frame? If so,
data %>%
mutate(ColB = ifelse(ColA == "ABC", "XXXX", ColB))
will change ColB
to "XXXX"
when ColA == "ABC"
and leave it as is otherwise. If ColB
does not already exist, then you will have to specify what to do for rows where ColA != "ABC"
, for example:
data %>%
mutate(ColB = ifelse(ColA == "ABC", "XXXX", NA))
Using data.table
, we'd do:
setDT(data)[colA == "ABC", ColB := "XXXX"]
and the values are modified in-place, unlike if-else
, which'd copy the entire column to replace just those rows where the condition satisfies.
We call this sub-assign by reference. You can read more about it in the new HTML vignettes.
Another option is to perform a subsequent combination of union and anti-join with the same data. This requires a primary key:
data <- data %>%
filter(ColA == "ABC") %>%
mutate(ColB = "XXXX") %>%
rbind_list(., anti_join(data, ., by = ...))
Example:
mtcars_n <- mtcars %>% add_rownames
mtcars_n %>%
filter(cyl > 6) %>%
mutate(mpg = 1) %>%
rbind_list(., anti_join(mtcars_n, ., by = "rowname"))
This is much slower than probably any other approach, but useful to get quick results by extending your existing pipe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With