Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change values in data frame in a specific row using dplyr

Tags:

r

dplyr

Is it possible to restrict a data frame to a specific row and then change some values in one of the columns?

Let's say I calculate GROWTH as (SIZE_t+1 - SIZE_t)/SIZE_t and now I can see that there are some strange values for GROWTH (e.g. 1000) and the reason is a corrupt value of the corresponding SIZE variable. Now I'd like to find and replace the corrupt value of SIZE.

If I type:

data <- mutate(filter(data, lead(GROWTH)==1000), SIZE = 2600)

then only the corrupt row is stored in data and the rest of my data frame is lost.

What I'd like to do instead is filter "data" on the left hand side to the corresponding row of the corrupt value and then mutate the incorrect variable (on the right hand side):

filter(data, lead(GROWTH)==1000)  <- mutate(filter(data, lead(GROWTH)==1000), SIZE = 2600) 

but that doesn't seem to work. Is there a way to handle this using dplyr? Many thanks in advance

like image 687
FlowRyan Avatar asked May 26 '16 20:05

FlowRyan


1 Answers

You can use an ifelse statement with mutate function. Let's say you have a data frame with some corrupted values in SIZE at row 3 which lead to a large GROWTH value at row 4 and you want to replace the SIZE at row 3, with some value 0.3 here(I chose to be different from yours just to be consistent with my values). The GROWTH > 1000 condition can be replaced accordingly.

data
          SIZE       GROWTH
1  -1.49578498           NA
2  -0.38731784   -0.7410605
3   0.00010000   -1.0002582
4   0.53842217 5383.2216758
5  -0.65813674   -2.2223433
6   0.29830698   -1.4532599
7   0.04712019   -0.8420413
8  -0.07312482   -2.5518788
9   1.64310713  -23.4698959
10  1.44927727   -0.1179654

library(dplyr)
data %>% mutate(SIZE = ifelse(lead(GROWTH > 1000, default = F), 0.3, SIZE))
          SIZE       GROWTH
1  -1.49578498           NA
2  -0.38731784   -0.7410605
3   0.30000000   -1.0002582
4   0.53842217 5383.2216758
5  -0.65813674   -2.2223433
6   0.29830698   -1.4532599
7   0.04712019   -0.8420413
8  -0.07312482   -2.5518788
9   1.64310713  -23.4698959
10  1.44927727   -0.1179654

Data:

structure(list(SIZE = c(-1.49578498093657, -0.387317841955887, 
1e-04, 0.538422167582116, -0.658136741561064, 0.298306980856383, 
0.0471201873908915, -0.0731248216938637, 1.64310713116132, 1.44927727104653
), GROWTH = c(NA, -0.741060482026387, -1.00025818588551, 5383.22167582116, 
-2.22234332311492, -1.45325988053609, -0.842041284935343, -2.55187883883499, 
-23.4698958999199, -0.117965442690154)), class = "data.frame", .Names = c("SIZE", 
"GROWTH"), row.names = c(NA, -10L))
like image 128
Psidom Avatar answered Sep 21 '22 20:09

Psidom