Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

assigning new values based on the location in the sequence

Working in R. The data tracks changes in brain activity over time. Column "mark" contains information when a particular treatment begins and ends. For examples, the first condition (mark==1) begins in row 3 and ends in row 6. The second experimental condition (mark==2) starts in row 9 and ends in 12. Another batch of treatment one is repeated between rows 15 and 18.

ob.id <- c(1:20)
mark <- c(0,0,1,0,0,1,0,0,2,0,0,2,0,0,1,0,0,1,0,0)
condition<-c(0,0,1,1,1,1,0,0,2,2,2,2,0,0,1, 1,1,1,0,0)
start <- data.frame(ob.id,mark)
result<-data.frame(ob.id,mark,condition)
print (start)
> print (start)
   ob.id mark
1      1    0
2      2    0
3      3    1
4      4    0
5      5    0
6      6    1
7      7    0
8      8    0
9      9    2
10    10    0
11    11    0
12    12    2
13    13    0
14    14    0
15    15    1
16    16    0
17    17    0
18    18    1
19    19    0
20    20    0

I need to create a column that would have a dummy variable indicating the membership of an observation in corresponding experimental condition, like this:

> print(result)
   ob.id mark condition
1      1    0         0
2      2    0         0
3      3    1         1
4      4    0         1
5      5    0         1
6      6    1         1
7      7    0         0
8      8    0         0
9      9    2         2
10    10    0         2
11    11    0         2
12    12    2         2
13    13    0         0
14    14    0         0
15    15    1         1
16    16    0         1
17    17    0         1
18    18    1         1
19    19    0         0
20    20    0         0

Thanks for your help!

like image 447
andrey Avatar asked Jul 24 '13 15:07

andrey


People also ask

How do I change values based on conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you assign a value to a DataFrame column?

DataFrame - assign() function The assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.


1 Answers

This is a fun little problem. The trick I use below is to first calculate the rle of the mark vector, which makes the problem simpler, as the resulting values vector will always have just one 0 that may or may not need to be replaced (depending on the surrounding values).

# example vector with some edge cases
v = c(0,0,1,0,0,0,1,2,0,0,2,0,0,1,0,0,0,0,1,2,0,2)

v.rle = rle(v)
v.rle
#Run Length Encoding
#  lengths: int [1:14] 2 1 3 1 1 2 1 2 1 4 ...
#  values : num [1:14] 0 1 0 1 2 0 2 0 1 0 ...

vals = rle(v)$values

# find the 0's that need to be replaced and replace by the previous value
idx = which(tail(head(vals,-1),-1) == 0 & (head(vals,-2) == tail(vals,-2)))
vals[idx + 1] <- vals[idx]

# finally go back to the original vector
v.rle$values = vals
inverse.rle(v.rle)
# [1] 0 0 1 1 1 1 1 2 2 2 2 0 0 1 1 1 1 1 1 2 2 2

Probably the least cumbersome thing to do is to put the above in a function and then apply that to your data.frame vector (as opposed to manipulating the vector explicitly).


Another approach, based on @SimonO101's observation, involves constructing the right groups from the starting data (run the by part separately, piece by piece, to see how it works):

library(data.table)
dt = data.table(start)

dt[, result := mark[1],
     by = {tmp = rep(0, length(mark));
           tmp[which(mark != 0)[c(F,T)]] = 1;
           cumsum(mark != 0) - tmp}]
dt
#    ob.id mark result
# 1:     1    0      0
# 2:     2    0      0
# 3:     3    1      1
# 4:     4    0      1
# 5:     5    0      1
# 6:     6    1      1
# 7:     7    0      0
# 8:     8    0      0
# 9:     9    2      2
#10:    10    0      2
#11:    11    0      2
#12:    12    2      2
#13:    13    0      0
#14:    14    0      0
#15:    15    1      1
#16:    16    0      1
#17:    17    0      1
#18:    18    1      1
#19:    19    0      0
#20:    20    0      0

The latter approach will probably be more flexible.

like image 78
eddi Avatar answered Oct 15 '22 19:10

eddi