Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Age calculation for observation data in R [duplicate]

Tags:

dataframe

r

I have very simple big observation data hypothetically structured as below:

> df = data.frame(ID = c("oak", "birch", rep("oak",2), "pine", "birch", "oak", rep("pine",2), "birch", "oak"),
+                 yearobs = c(rep(1998,3), rep(1999,2), rep(2000,3),rep(2001,2), 2002))
> df
      ID yearobs
1    oak    1998
2  birch    1998
3    oak    1998
4    oak    1999
5   pine    1999
6  birch    2000
7    oak    2000
8   pine    2000
9   pine    2001
10 birch    2001
11   oak    2002

What I want to do is to calculate the age by taking the difference between the years ( max(yearobs)-min(yearobs) ) for each unique ID (tree species in this example). I have tried to work with lubridate + dplyr packages, however, number of observations for each unique ID varies in my data and I want to create an age column in a fastest way without storing minimum and maximum values separately (avoiding for loops here since my data is huge).

Desired output:

     ID age
1   oak   4
2 birch   3
3  pine   3

Any suggestion would be appreciated.

like image 203
DSA Avatar asked Dec 22 '22 21:12

DSA


1 Answers

In base R you can do:

aggregate(yearobs ~ ID, data = df, FUN = function(x) max(x) - min(x))
#      ID yearobs
# 1 birch       3
# 2   oak       4
# 3  pine       2
like image 186
sindri_baldur Avatar answered Jan 10 '23 03:01

sindri_baldur