apply lag or lead in increasing order for the dataframe

Question

df1 <- read.csv("C:/Users/uni/DS-project/df1.csv")
df1

    year value
1  2000     1
2  2001     2
3  2002     3
4  2003     4
5  2004     5
6  2000     1
7  2001     2
8  2002     3
9  2003     4
10 2004     5
11 2000     1
12 2001     2
13 2002     3
14 2003     4
15 2004     5
16 2000     1
17 2001     2
18 2002     3
19 2003     4
20 2004     5

i want to apply lead so i can get the output in the below fashion.

we have set of 5 observation of each year repeated for n number of times, in output for 1st year we need to remove 2000 and its respective value, similar for second year we neglect 2000 and 2001 and its respective value, and for 3rd year remove - 2000, 2001, 2002 and its respective value. And so on.

so that we can get the below output in below manner.

output: 
year    value
2000    1
2001    2
2002    3
2003    4
2004    5
2001    2
2002    3
2003    4
2004    5
2002    3
2003    4
2004    5
2003    4
2004    5

please help.

David Arenburg · Accepted Answer

Just for fun, adding a vectorized solution using matrix sub-setting

m <- matrix(rep(TRUE, nrow(df)), 5)
m[upper.tri(m)] <- FALSE
df[m,]
#    year value
# 1  2000     1
# 2  2001     2
# 3  2002     3
# 4  2003     4
# 5  2004     5
# 7  2001     2
# 8  2002     3
# 9  2003     4
# 10 2004     5
# 13 2002     3
# 14 2003     4
# 15 2004     5
# 19 2003     4
# 20 2004     5

G. Grothendieck · Answer

Below grp is 1 for each row of the first group, 2 for the second and so on. Seq is 1, 2, 3, ... for the successive rows of each grp. Now just pick out those rows for which Seq is at least as large as grp. This has the effect of removing the first i-1 rows from the ith group for i = 1, 2, ... .

grp <- cumsum(df1$year == 2000)
Seq <- ave(grp, grp, FUN = seq_along)
subset(df1, Seq >= grp)

We could alternately write this in the less general form:

subset(df1, 1:5 >= rep(1:4, each = 5))

In any case the output from either subset statement is:

   year value
1  2000     1
2  2001     2
3  2002     3
4  2003     4
5  2004     5
7  2001     2
8  2002     3
9  2003     4
10 2004     5
13 2002     3
14 2003     4
15 2004     5
19 2003     4
20 2004     5

apply lag or lead in increasing order for the dataframe

Tags:

dataframe

r

lag

lead

Robby star

2 Answers

David Arenburg

G. Grothendieck

Recent Activity

Donate For Us

apply lag or lead in increasing order for the dataframe

Tags:

dataframe

r

lag

lead

Robby star

2 Answers

David Arenburg

G. Grothendieck

Related questions

Recent Activity

Donate For Us