I have data consisting of a single vector / column in a tibble:
my_tibble <- tibble(score = c(1,2,3,4,9,8,7,6,5,4))
For every row of my_tibble$score I want to calculate the difference to the largest "leading" element in the same column. This new column shall be called "difference". For example, the first row of difference should be 1 - 9, while the fifth row should be 9 - 8, and the last row will turn NA, as there is no value coming behind/below the 4.
In the end, the new tibble should look like this:
score | difference
<dbl> <dbl>
1 -8
2 -7
3 -6
4 -5
9 1
8 1
7 1
6 1
5 1
4 NA
I want to achieve this using dplyr and have so far tried many variations of mutate like
my_tibble %>%
mutate(difference = score[which(score > score)])
Hoping to find some way that the second "score" in the which funtction refers to the current row being mutated. However I was unsuccessfull after hours of trying and desperately searching for a solution online.
The closest I found was dplyr: Summing n leading values, however that still leaves me with the problem that I want the difference to the maximum leading value of all leading values, not only of the closest n leading values.
Help and/or referral to whereever this has been answered or addressed before is greatly appreciated!
My solution:
my_tibble <- my_tibble %>%
mutate(difference = map_dbl(seq_along(score), function(y) ifelse(y == length(score),
NA_real_, score[y] - max(c(score[-(1:y)])))))
Output
> my_tibble
# A tibble: 10 x 2
score difference
<dbl> <dbl>
1 1 -8
2 2 -7
3 3 -6
4 4 -5
5 9 1
6 8 1
7 7 1
8 6 1
9 5 1
10 4 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With