Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Comparing values in a column and creating a new column with the results of this comparison. Is there a better way than looping?

Tags:

r

I'm a beginner of R. Although I have read a lot in manuals and here at this board, I have to ask my first question. It's a little bit the same as here but not really the same and i don't understand the explanation there.
I have a dataframe with hundreds of thousands of rows and 30 columns. But for my question I created a simplier dataframe that you can use:

a <- sample(c(1,3,5,9), 20, replace = TRUE)
b <- sample(c(1,NA), 20, replace = TRUE)
df <- data.frame(a,b)

Now I want to compare the values of the last column (here column b), so that I'm looking iteratively at the value of each row if it is the same as the in the next row. If it is the same I want to write a 0 as the value in a new column in the same row, otherwise it should be a 1 as the value of the new column.

Here you can see my code, that's not working, because the rows of the new column only contain 0:

m<-c()

for (i in seq(along=df[,1])){
    ifelse(df$b[i] == df$b[i+1],m <- 0, m <- 1)          
    df$mov <- m
}

The result, what I want to get, looks like the example below. What's the mistake? And is there a better way than creating loops? Maybe looping could be very slow for my big dataset.

   a  b mov
1  9 NA   0
2  1 NA   1
3  1  1   1
4  5 NA   0
5  1 NA   0
6  3 NA   0
7  3 NA   1
8  5  1   0
9  1  1   0
10 3  1   0
11 1  1   0
12 9  1   0
13 1  1   1
14 5 NA   0
15 9 NA   0
16 9 NA   0
17 9 NA   0
18 5 NA   0
19 3 NA   0
20 1 NA   0

Thank you for your help!

like image 508
Simon1723 Avatar asked Nov 19 '25 14:11

Simon1723


1 Answers

There are a couple things to consider in your example.

First, to avoid a loop, you can create a copy of the vector that is shifted by one position. (There are about 20 ways to do this.) Then when you test vector B vs C it will do element-by-element comparison of each position vs its neighbor.

Second, equality comparisons don't work with NA -- they always return NA. So NA == NA is not TRUE it is NA! Again, there are about 20 ways to get around this, but here I have just replaced all the NAs in the temporary vector with a placeholder that will work for the tests of equality.

Finally, you have to decide what you want to do with the last value (which doesn't have a neighbor). Here I have put 1, which is your assignment for "doesn't match its neighbor".

So, depending on the range of values possible in b, you could do

c = df$b 
z = length(c)
c[is.na(c)] = 'x'   # replace NA with value that will allow equality test
df$mov = c(1 * !(c[1:z-1] == c[2:z]),1)     # add 1 to the end for the last value
like image 186
beroe Avatar answered Nov 22 '25 04:11

beroe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!