I have data that looks like the following:
Region X2012 X2013 X2014 X2015 X2016 X2017
1 1 10 11 12 13 14 15
2 2 NA 17 14 NA 23 NA
3 3 12 18 18 NA 23 NA
4 4 NA NA 15 28 NA 38
5 5 14 18.5 16 27 25 39
6 6 15 NA 17 27.5 NA 39
The numbers are irrelevant here but what I am trying to do is take the difference between the earliest and latest observed points in each row to make a new column for the difference where:
Region Diff
1 (15 - 10) = 5
2 (23 - 17) = 6
and so on, not actually showing the subtraction but the final result. Ideally i would just subtract the 2017 column from the 2012 column but since any row's first observationcould start at any column and also end at any column I am unsure of how to take the difference.
A dplyr solution would be ideal but any solution at all is appreciated.
Define a function which takes the last minus the first element of its vector argument omitting NAs and apply it to each row.
lastMinusFirst <- function(x, y = na.omit(x)) tail(y, 1) - y[1]
transform(DF, diff = apply(DF[-1], 1, lastMinusFirst))
giving:
Region X2012 X2013 X2014 X2015 X2016 X2017 diff
1 1 10 11.0 12 13.0 14 15 5
2 2 NA 17.0 14 NA 23 NA 6
3 3 12 18.0 18 NA 23 NA 11
4 4 NA NA 15 28.0 NA 38 23
5 5 14 18.5 16 27.0 25 39 25
6 6 15 NA 17 27.5 NA 39 24
The input in reproducible form:
Lines <- "Region X2012 X2013 X2014 X2015 X2016 X2017
1 1 10 11 12 13 14 15
2 2 NA 17 14 NA 23 NA
3 3 12 18 18 NA 23 NA
4 4 NA NA 15 28 NA 38
5 5 14 18.5 16 27 25 39
6 6 NA NA NA NA NA NA"
DF <- read.table(text = Lines)
Fixed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With