Working in R. The data tracks changes in brain activity over time. Column "mark" contains information when a particular treatment begins and ends. For examples, the first condition (mark==1) begins in row 3 and ends in row 6. The second experimental condition (mark==2) starts in row 9 and ends in 12. Another batch of treatment one is repeated between rows 15 and 18. <pre class="prettyprint"><code>ob.id <- c(1:20) mark <- c(0,0,1,0,0,1,0,0,2,0,0,2,0,0,1,0,0,1,0,0) condition<-c(0,0,1,1,1,1,0,0,2,2,2,2,0,0,1, 1,1,1,0,0) start <- data.frame(ob.id,mark) result<-data.frame(ob.id,mark,condition) print (start) > print (start) ob.id mark 1 1 0 2 2 0 3 3 1 4 4 0 5 5 0 6 6 1 7 7 0 8 8 0 9 9 2 10 10 0 11 11 0 12 12 2 13 13 0 14 14 0 15 15 1 16 16 0 17 17 0 18 18 1 19 19 0 20 20 0 </code></pre> I need to create a column that would have a dummy variable indicating the membership of an observation in corresponding experimental condition, like this: <pre class="prettyprint"><code>> print(result) ob.id mark condition 1 1 0 0 2 2 0 0 3 3 1 1 4 4 0 1 5 5 0 1 6 6 1 1 7 7 0 0 8 8 0 0 9 9 2 2 10 10 0 2 11 11 0 2 12 12 2 2 13 13 0 0 14 14 0 0 15 15 1 1 16 16 0 1 17 17 0 1 18 18 1 1 19 19 0 0 20 20 0 0 </code></pre> Thanks for your help!

This is a fun little problem. The trick I use below is to first calculate the <code>rle</code> of the <code>mark</code> vector, which makes the problem simpler, as the resulting <code>values</code> vector will always have just one 0 that may or may not need to be replaced (depending on the surrounding values). <pre class="prettyprint"><code># example vector with some edge cases v = c(0,0,1,0,0,0,1,2,0,0,2,0,0,1,0,0,0,0,1,2,0,2) v.rle = rle(v) v.rle #Run Length Encoding # lengths: int [1:14] 2 1 3 1 1 2 1 2 1 4 ... # values : num [1:14] 0 1 0 1 2 0 2 0 1 0 ... vals = rle(v)$values # find the 0's that need to be replaced and replace by the previous value idx = which(tail(head(vals,-1),-1) == 0 & (head(vals,-2) == tail(vals,-2))) vals[idx + 1] <- vals[idx] # finally go back to the original vector v.rle$values = vals inverse.rle(v.rle) # [1] 0 0 1 1 1 1 1 2 2 2 2 0 0 1 1 1 1 1 1 2 2 2 </code></pre> Probably the least cumbersome thing to do is to put the above in a function and then apply that to your <code>data.frame</code> vector (as opposed to manipulating the vector explicitly). <hr> Another approach, based on @SimonO101's observation, involves constructing the right groups from the starting data (run the <code>by</code> part separately, piece by piece, to see how it works): <pre class="prettyprint"><code>library(data.table) dt = data.table(start) dt[, result := mark[1], by = {tmp = rep(0, length(mark)); tmp[which(mark != 0)[c(F,T)]] = 1; cumsum(mark != 0) - tmp}] dt # ob.id mark result # 1: 1 0 0 # 2: 2 0 0 # 3: 3 1 1 # 4: 4 0 1 # 5: 5 0 1 # 6: 6 1 1 # 7: 7 0 0 # 8: 8 0 0 # 9: 9 2 2 #10: 10 0 2 #11: 11 0 2 #12: 12 2 2 #13: 13 0 0 #14: 14 0 0 #15: 15 1 1 #16: 16 0 1 #17: 17 0 1 #18: 18 1 1 #19: 19 0 0 #20: 20 0 0 </code></pre> The latter approach will probably be more flexible.

assigning new values based on the location in the sequence

Working in R. The data tracks changes in brain activity over time. Column "mark" contains information when a particular treatment begins and ends. For examples, the first condition (mark==1) begins in row 3 and ends in row 6. The second experimental condition (mark==2) starts in row 9 and ends in 12. Another batch of treatment one is repeated between rows 15 and 18.

ob.id <- c(1:20)
mark <- c(0,0,1,0,0,1,0,0,2,0,0,2,0,0,1,0,0,1,0,0)
condition<-c(0,0,1,1,1,1,0,0,2,2,2,2,0,0,1, 1,1,1,0,0)
start <- data.frame(ob.id,mark)
result<-data.frame(ob.id,mark,condition)
print (start)
> print (start)
   ob.id mark
1      1    0
2      2    0
3      3    1
4      4    0
5      5    0
6      6    1
7      7    0
8      8    0
9      9    2
10    10    0
11    11    0
12    12    2
13    13    0
14    14    0
15    15    1
16    16    0
17    17    0
18    18    1
19    19    0
20    20    0

I need to create a column that would have a dummy variable indicating the membership of an observation in corresponding experimental condition, like this:

> print(result)
   ob.id mark condition
1      1    0         0
2      2    0         0
3      3    1         1
4      4    0         1
5      5    0         1
6      6    1         1
7      7    0         0
8      8    0         0
9      9    2         2
10    10    0         2
11    11    0         2
12    12    2         2
13    13    0         0
14    14    0         0
15    15    1         1
16    16    0         1
17    17    0         1
18    18    1         1
19    19    0         0
20    20    0         0

Thanks for your help!

How do I change values based on conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you assign a value to a DataFrame column?

DataFrame - assign() function The assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.

This is a fun little problem. The trick I use below is to first calculate the rle of the mark vector, which makes the problem simpler, as the resulting values vector will always have just one 0 that may or may not need to be replaced (depending on the surrounding values).

# example vector with some edge cases
v = c(0,0,1,0,0,0,1,2,0,0,2,0,0,1,0,0,0,0,1,2,0,2)

v.rle = rle(v)
v.rle
#Run Length Encoding
#  lengths: int [1:14] 2 1 3 1 1 2 1 2 1 4 ...
#  values : num [1:14] 0 1 0 1 2 0 2 0 1 0 ...

vals = rle(v)$values

# find the 0's that need to be replaced and replace by the previous value
idx = which(tail(head(vals,-1),-1) == 0 & (head(vals,-2) == tail(vals,-2)))
vals[idx + 1] <- vals[idx]

# finally go back to the original vector
v.rle$values = vals
inverse.rle(v.rle)
# [1] 0 0 1 1 1 1 1 2 2 2 2 0 0 1 1 1 1 1 1 2 2 2

Probably the least cumbersome thing to do is to put the above in a function and then apply that to your data.frame vector (as opposed to manipulating the vector explicitly).

Another approach, based on @SimonO101's observation, involves constructing the right groups from the starting data (run the by part separately, piece by piece, to see how it works):

library(data.table)
dt = data.table(start)

dt[, result := mark[1],
     by = {tmp = rep(0, length(mark));
           tmp[which(mark != 0)[c(F,T)]] = 1;
           cumsum(mark != 0) - tmp}]
dt
#    ob.id mark result
# 1:     1    0      0
# 2:     2    0      0
# 3:     3    1      1
# 4:     4    0      1
# 5:     5    0      1
# 6:     6    1      1
# 7:     7    0      0
# 8:     8    0      0
# 9:     9    2      2
#10:    10    0      2
#11:    11    0      2
#12:    12    2      2
#13:    13    0      0
#14:    14    0      0
#15:    15    1      1
#16:    16    0      1
#17:    17    0      1
#18:    18    1      1
#19:    19    0      0
#20:    20    0      0

The latter approach will probably be more flexible.

assigning new values based on the location in the sequence

Tags:

for-loop

r

conditional-statements

if-statement

data-manipulation

andrey

People also ask

1 Answers

eddi

Recent Activity

Donate For Us

assigning new values based on the location in the sequence

Tags:

for-loop

r

conditional-statements

if-statement

data-manipulation

andrey

People also ask

1 Answers

eddi

Related questions

Recent Activity

Donate For Us