Say I have the following sample dataset: <pre class="prettyprint"><code>iris <- data.table(iris)[c(1:5,51:55,101:105), list(ID=.I, Species,Sepal.Length)] </code></pre> Then say that I want to calculate the absolute difference between rows within a group (in this case, <code>Species</code>). <pre class="prettyprint"><code>iris[ , SL.Diff := c(NA,abs(diff(Sepal.Length))) , by = Species] </code></pre> At this point, I have a dataset that looks like the following: <pre class="prettyprint"><code> ID Species Sepal.Length SL.Diff 1: 1 setosa 5.1 NA 2: 2 setosa 4.9 0.2 3: 3 setosa 4.7 0.2 4: 4 setosa 4.6 0.1 5: 5 setosa 5.0 0.4 6: 6 versicolor 7.0 NA </code></pre> Now I want to calculate a new variable <code>Sepal.Length2</code> that takes on the next row's value if <code>SL.Diff</code> is less than a threshold of 0.3. <pre class="prettyprint"><code>iris[ , Sepal.Length2 := ifelse(SL.Diff < 0.3, iris[ID+1]$Sepal.Length, Sepal.Length)] </code></pre> This works the way I want it to. But what if I want to do the same comparison but instead of taking on the next row, I want to take on the value of the previous row? <pre class="prettyprint"><code>iris[ , Sepal.Length3 := ifelse(SL.Diff < 0.3, iris[ID-1]$Sepal.Length, Sepal.Length)] </code></pre> <code>Sepal.Length3</code> does not give the output that I was expecting. Anyone know what I could be doing wrong here? <pre class="prettyprint"><code> ID Species Sepal.Length SL.Diff Sepal.Length2 Sepal.Length3 1: 1 setosa 5.1 NA NA NA 2: 2 setosa 4.9 0.2 4.7 4.9 3: 3 setosa 4.7 0.2 4.6 4.7 4: 4 setosa 4.6 0.1 5.0 4.6 5: 5 setosa 5.0 0.4 5.0 5.0 6: 6 versicolor 7.0 NA NA NA 7: 7 versicolor 6.4 0.6 6.4 6.4 8: 8 versicolor 6.9 0.5 6.9 6.9 9: 9 versicolor 5.5 1.4 5.5 5.5 10: 10 versicolor 6.5 1.0 6.5 6.5 11: 11 virginica 6.3 NA NA NA 12: 12 virginica 5.8 0.5 5.8 5.8 13: 13 virginica 7.1 1.3 7.1 7.1 14: 14 virginica 6.3 0.8 6.3 6.3 15: 15 virginica 6.5 0.2 NA 5.1 </code></pre>

I think dplyr makes this a little easier to express by providing <code>lead()</code> and <code>lag()</code> functions: <pre class="prettyprint"><code>library(dplyr) iris2 <- iris[c(1:5, 51:55, 101:105), c("Species", "Sepal.Length")] names(iris2) <- c("species", "sepal") iris2$id <- 1:15 iris2 %>% group_by(species) %>% mutate( thres = abs(sepal - lag(sepal)), up = ifelse(thres < 0.3, lead(sepal), sepal), down = ifelse(thres < 0.3, lag(sepal), sepal) ) </code></pre>

Row Referencing in R data.table package

Tags:

r

data.table

Say I have the following sample dataset:

iris <- data.table(iris)[c(1:5,51:55,101:105), list(ID=.I, Species,Sepal.Length)]

Then say that I want to calculate the absolute difference between rows within a group (in this case, Species).

iris[ , SL.Diff := c(NA,abs(diff(Sepal.Length))) , by = Species]

At this point, I have a dataset that looks like the following:

   ID    Species Sepal.Length SL.Diff
1:  1     setosa          5.1      NA
2:  2     setosa          4.9     0.2
3:  3     setosa          4.7     0.2
4:  4     setosa          4.6     0.1
5:  5     setosa          5.0     0.4
6:  6 versicolor          7.0      NA

Now I want to calculate a new variable Sepal.Length2 that takes on the next row's value if SL.Diff is less than a threshold of 0.3.

iris[ , Sepal.Length2 := ifelse(SL.Diff < 0.3, iris[ID+1]$Sepal.Length, Sepal.Length)]

This works the way I want it to. But what if I want to do the same comparison but instead of taking on the next row, I want to take on the value of the previous row?

iris[ , Sepal.Length3 := ifelse(SL.Diff < 0.3, iris[ID-1]$Sepal.Length, Sepal.Length)]

Sepal.Length3 does not give the output that I was expecting. Anyone know what I could be doing wrong here?

    ID    Species Sepal.Length SL.Diff Sepal.Length2 Sepal.Length3
 1:  1     setosa          5.1      NA            NA            NA
 2:  2     setosa          4.9     0.2           4.7           4.9
 3:  3     setosa          4.7     0.2           4.6           4.7
 4:  4     setosa          4.6     0.1           5.0           4.6
 5:  5     setosa          5.0     0.4           5.0           5.0
 6:  6 versicolor          7.0      NA            NA            NA
 7:  7 versicolor          6.4     0.6           6.4           6.4
 8:  8 versicolor          6.9     0.5           6.9           6.9
 9:  9 versicolor          5.5     1.4           5.5           5.5
10: 10 versicolor          6.5     1.0           6.5           6.5
11: 11  virginica          6.3      NA            NA            NA
12: 12  virginica          5.8     0.5           5.8           5.8
13: 13  virginica          7.1     1.3           7.1           7.1
14: 14  virginica          6.3     0.8           6.3           6.3
15: 15  virginica          6.5     0.2            NA           5.1

431

asked Aug 01 '14 03:08

Mike.Gahan

3 Answers

Not sure of the speed implications of this, but here's another attempt:

# make a column of the next values using head()
iris[, S3 := c(NA,head(Sepal.Length,-1)), by=Species]
# overwrite those values not meeting your criteria with the original values
iris[ !(SL.Diff < 0.3), S3 := Sepal.Length]

iris
#    ID    Species Sepal.Length SL.Diff  S3
# 1:  1     setosa          5.1      NA  NA
# 2:  2     setosa          4.9     0.2 5.1
# 3:  3     setosa          4.7     0.2 4.9
# 4:  4     setosa          4.6     0.1 4.7
# 5:  5     setosa          5.0     0.4 5.0
# 6:  6 versicolor          7.0      NA  NA
# 7:  7 versicolor          6.4     0.6 6.4
# 8:  8 versicolor          6.9     0.5 6.9
# 9:  9 versicolor          5.5     1.4 5.5
#10: 10 versicolor          6.5     1.0 6.5
#11: 11  virginica          6.3      NA  NA
#12: 12  virginica          5.8     0.5 5.8
#13: 13  virginica          7.1     1.3 7.1
#14: 14  virginica          6.3     0.8 6.3
#15: 15  virginica          6.5     0.2 6.3

166

answered Sep 25 '22 23:09

thelatemail

data.table.[ evaluates i and j in the scope of the data.table in question.

Therefore

iris[ID+1]$Sepal.Length evaulates ID in the scope of iris (for a second time).

Your issue really arises because you are creating a 0 index (which is silently dropped by R)

a <- c('a','b')
a[0:1]
# [1] "a"
 a[1]
# [1] "a"

So, you need to deal better with "known NA values" and implied NA values.

Here is an approach

# calculate the "threshold" column
iris[,thresh := SL.Diff <0.3]
# where does it need to go "up" and what indexed value need it go up by
iris[!is.na(thresh), up := ifelse(thresh, ID+1L,ID)]
# create the column
iris[, S2 := Sepal.Length[up]]
# the same for "down"

iris[!is.na(thresh), down := ifelse(thresh, ID-1L,ID)]
iris[, S3 := Sepal.Length[down]]

iris
# ID       Species Sepal.Length SL.Diff thresh up  S2 down  S3
# 1:  1      setosa          5.1      NA     NA NA  NA   NA  NA
# 2:  2      setosa          4.9     0.2   TRUE  3 4.7    1 5.1
# 3:  3      setosa          4.7     0.2   TRUE  4 4.6    2 4.9
# 4:  4      setosa          4.6     0.1   TRUE  5 5.0    3 4.7
# 5:  5      setosa          5.0     0.4  FALSE  5 5.0    5 5.0
# 6:  6  versicolor          7.0      NA     NA NA  NA   NA  NA
# 7:  7  versicolor          6.4     0.6  FALSE  7 6.4    7 6.4
# 8:  8  versicolor          6.9     0.5  FALSE  8 6.9    8 6.9
# 9:  9  versicolor          5.5     1.4  FALSE  9 5.5    9 5.5
# 10: 10 versicolor          6.5     1.0  FALSE 10 6.5   10 6.5
# 11: 11  virginica          6.3      NA     NA NA  NA   NA  NA
# 12: 12  virginica          5.8     0.5  FALSE 12 5.8   12 5.8
# 13: 13  virginica          7.1     1.3  FALSE 13 7.1   13 7.1
# 14: 14  virginica          6.3     0.8  FALSE 14 6.3   14 6.3
# 15: 15  virginica          6.5     0.2   TRUE 16  NA   14 6.3

answered Sep 25 '22 23:09

mnel

I think dplyr makes this a little easier to express by providing lead() and lag() functions:

library(dplyr)
iris2 <- iris[c(1:5, 51:55, 101:105), c("Species", "Sepal.Length")]
names(iris2) <- c("species", "sepal")
iris2$id <- 1:15

iris2 %>%
  group_by(species) %>%
  mutate(
    thres = abs(sepal - lag(sepal)),
    up =   ifelse(thres < 0.3, lead(sepal), sepal),
    down = ifelse(thres < 0.3, lag(sepal), sepal)
  )

answered Sep 25 '22 23:09

hadley

Related questions
                            
                                Is there an equivalent to R's negative indexing in Matlab?
                            
                                Getting predictions after rfImpute
                            
                                ggplot2: boxplot with all points distributed evenly in a row
                            
                                PDFs are missing images when compiling knitr .RNW examples
                            
                                Is there a way to prevent the download page from opening in R Shiny?
                            
                                How to interpret R linear regression when there are multiple factor levels as the baseline? [closed]
                            
                                Trouble with date format using the function as.POSIXct in R
                            
                                How to add bounding box to a specific area in ggplot2 heatmap?
                            
                                Multiply each element in a vector by itself to create a matrix
                            
                                How to convert Rcpp::List to std::vector<double>
                            
                                Sorting only first character, along a specific order
                            
                                How to prevent write.csv from changing POSIXct, dates and times class back to character/factors?
                            
                                How to pull out Dispersion parameter in R
                            
                                How to squeeze in missing values into a vector
                            
                                ggplot2 continuous colors for discrete scale and delete a legend
                            
                                Predicting Probabilities for GBM with caret library
                            
                                Multiple plots in a for loop with Sweave
                            
                                R grep and exact matches
                            
                                mailR: how to send rmarkdown documents as body in email?
                            
                                R markdown presentation not displaying plots

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Row Referencing in R data.table package

Tags:

r

data.table

Mike.Gahan

People also ask

3 Answers

thelatemail

mnel

hadley

Recent Activity

Donate For Us