I have very simple big observation data hypothetically structured as below:
> df = data.frame(ID = c("oak", "birch", rep("oak",2), "pine", "birch", "oak", rep("pine",2), "birch", "oak"),
+ yearobs = c(rep(1998,3), rep(1999,2), rep(2000,3),rep(2001,2), 2002))
> df
ID yearobs
1 oak 1998
2 birch 1998
3 oak 1998
4 oak 1999
5 pine 1999
6 birch 2000
7 oak 2000
8 pine 2000
9 pine 2001
10 birch 2001
11 oak 2002
What I want to do is to calculate the age by taking the difference between the years ( max(yearobs)-min(yearobs)
) for each unique ID (tree species in this example). I have tried to work with lubridate
+ dplyr
packages, however, number of observations for each unique ID varies in my data and I want to create an age column in a fastest way without storing minimum and maximum values separately (avoiding for loops here since my data is huge).
Desired output:
ID age
1 oak 4
2 birch 3
3 pine 3
Any suggestion would be appreciated.
In base R you can do:
aggregate(yearobs ~ ID, data = df, FUN = function(x) max(x) - min(x))
# ID yearobs
# 1 birch 3
# 2 oak 4
# 3 pine 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With