df1 <- read.csv("C:/Users/uni/DS-project/df1.csv")
df1
year value
1 2000 1
2 2001 2
3 2002 3
4 2003 4
5 2004 5
6 2000 1
7 2001 2
8 2002 3
9 2003 4
10 2004 5
11 2000 1
12 2001 2
13 2002 3
14 2003 4
15 2004 5
16 2000 1
17 2001 2
18 2002 3
19 2003 4
20 2004 5
i want to apply lead so i can get the output in the below fashion.
we have set of 5 observation of each year repeated for n number of times, in output for 1st year we need to remove 2000 and its respective value, similar for second year we neglect 2000 and 2001 and its respective value, and for 3rd year remove - 2000, 2001, 2002 and its respective value. And so on.
so that we can get the below output in below manner.
output:
year value
2000 1
2001 2
2002 3
2003 4
2004 5
2001 2
2002 3
2003 4
2004 5
2002 3
2003 4
2004 5
2003 4
2004 5
please help.
Just for fun, adding a vectorized solution using matrix sub-setting
m <- matrix(rep(TRUE, nrow(df)), 5)
m[upper.tri(m)] <- FALSE
df[m,]
# year value
# 1 2000 1
# 2 2001 2
# 3 2002 3
# 4 2003 4
# 5 2004 5
# 7 2001 2
# 8 2002 3
# 9 2003 4
# 10 2004 5
# 13 2002 3
# 14 2003 4
# 15 2004 5
# 19 2003 4
# 20 2004 5
Below grp is 1 for each row of the first group, 2 for the second and so on. Seq is 1, 2, 3, ... for the successive rows of each grp. Now just pick out those rows for which Seq is at least as large as grp. This has the effect of removing the first i-1 rows from the ith group for i = 1, 2, ... .
grp <- cumsum(df1$year == 2000)
Seq <- ave(grp, grp, FUN = seq_along)
subset(df1, Seq >= grp)
We could alternately write this in the less general form:
subset(df1, 1:5 >= rep(1:4, each = 5))
In any case the output from either subset statement is:
year value
1 2000 1
2 2001 2
3 2002 3
4 2003 4
5 2004 5
7 2001 2
8 2002 3
9 2003 4
10 2004 5
13 2002 3
14 2003 4
15 2004 5
19 2003 4
20 2004 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With