Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Targeted variable recoding in R

I am trying to do some text processing and need to recode the words of sentences so that a target word is identified in a particular way in the new variable. For instance, given a data frame that looks like this...

subj <- c("1", "1", "1", "2", "2", "2", "2", "2")
condition <- c("A", "A", "A", "B", "B", "B", "B", "B")
sentence <- c("1", "1", "1", "2", "2", "2", "2", "2")
word <- c("I", "like", "dogs.", "We", "don't", "like", "this", "song.")
d <- data.frame(subj,condition, sentence, word)

 subj condition sentence  word
 1         A        1     I
 1         A        1     like
 1         A        1     dogs.
 2         B        2     We
 2         B        2     don't
 2         B        2     like
 2         B        2     this
 2         B        2     song.

I need to create a new column for which every instance of the target word (in this example, when d$word="like") is marked 0, and all words before "like" in the sentence block decrement and all words after "like" increment. Each subject has multiple sentences, and sentences vary by condition, so the loop needs to consider instances of the target word per subject, per sentence. The end result should look something like this.

 subj condition sentence  word   position
 1         A        1     I        -1
 1         A        1     like      0
 1         A        1     dogs.     1
 2         B        2     We       -2
 2         B        2     don't    -1
 2         B        2     like      0
 2         B        2     this      1
 2         B        2     song.     2

Sorry if the question is poorly worded, I hope it makes sense! Note that the target is not in the same place (relative to the start of the sentence) in each sentence. I am pretty new to R and can figure out how to increment or decrement, but not do both things within each sentence block. Any suggestions on the best way to go about this? Thanks much!

like image 575
amurphy Avatar asked Mar 04 '13 02:03

amurphy


People also ask

How do I recode a categorical variable in R?

Recoding a categorical variable The easiest way is to use revalue() or mapvalues() from the plyr package. This will code M as 1 and F as 2 , and put it in a new column.

What does recode () do in R?

Recoding Variables in R Recoding allows you to create new variables and to replace existing values of a variables based on a criterion. This way we can replace the data for every row without any criteria.

How do I recode a missing variable in R?

To recode missing values; or recode specific indicators that represent missing values, we can use normal subsetting and assignment operations. For example, we can recode missing values in vector x with the mean values in x by first subsetting the vector to identify NA s and then assign these elements a value.


1 Answers

You can add an index which you can then use for the relative positions.
Using data.table makes breaking it down by sentence very easy

library(data.table)
DT <- data.table(indx=1:nrow(d), d, key="indx")

DT[, position:=(indx - indx[word=="like"]), by=sentence]

# Results
DT
#    indx subj condition sentence  word position
# 1:    1    1         A        1     I       -1
# 2:    2    1         A        1  like        0
# 3:    3    1         A        1 dogs.        1
# 4:    4    2         B        2    We       -2
# 5:    5    2         B        2 don't       -1
# 6:    6    2         B        2  like        0
# 7:    7    2         B        2  this        1
# 8:    8    2         B        2 song.        2

Udate:

In case you have grammatically incorrect sentences, you might want to use grepl instead of ==

DT[, position:=(indx - indx[grepl("like", word)]), by=sentence]
like image 188
Ricardo Saporta Avatar answered Oct 15 '22 23:10

Ricardo Saporta