Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply lag or lead in increasing order for the dataframe

df1 <- read.csv("C:/Users/uni/DS-project/df1.csv")
df1

    year value
1  2000     1
2  2001     2
3  2002     3
4  2003     4
5  2004     5
6  2000     1
7  2001     2
8  2002     3
9  2003     4
10 2004     5
11 2000     1
12 2001     2
13 2002     3
14 2003     4
15 2004     5
16 2000     1
17 2001     2
18 2002     3
19 2003     4
20 2004     5

i want to apply lead so i can get the output in the below fashion.

we have set of 5 observation of each year repeated for n number of times, in output for 1st year we need to remove 2000 and its respective value, similar for second year we neglect 2000 and 2001 and its respective value, and for 3rd year remove - 2000, 2001, 2002 and its respective value. And so on.

so that we can get the below output in below manner.

output: 
year    value
2000    1
2001    2
2002    3
2003    4
2004    5
2001    2
2002    3
2003    4
2004    5
2002    3
2003    4
2004    5
2003    4
2004    5

please help.

like image 968
Robby star Avatar asked Nov 19 '25 20:11

Robby star


2 Answers

Just for fun, adding a vectorized solution using matrix sub-setting

m <- matrix(rep(TRUE, nrow(df)), 5)
m[upper.tri(m)] <- FALSE
df[m,]
#    year value
# 1  2000     1
# 2  2001     2
# 3  2002     3
# 4  2003     4
# 5  2004     5
# 7  2001     2
# 8  2002     3
# 9  2003     4
# 10 2004     5
# 13 2002     3
# 14 2003     4
# 15 2004     5
# 19 2003     4
# 20 2004     5
like image 84
David Arenburg Avatar answered Nov 22 '25 10:11

David Arenburg


Below grp is 1 for each row of the first group, 2 for the second and so on. Seq is 1, 2, 3, ... for the successive rows of each grp. Now just pick out those rows for which Seq is at least as large as grp. This has the effect of removing the first i-1 rows from the ith group for i = 1, 2, ... .

grp <- cumsum(df1$year == 2000)
Seq <- ave(grp, grp, FUN = seq_along)
subset(df1, Seq >= grp)

We could alternately write this in the less general form:

subset(df1, 1:5 >= rep(1:4, each = 5))

In any case the output from either subset statement is:

   year value
1  2000     1
2  2001     2
3  2002     3
4  2003     4
5  2004     5
7  2001     2
8  2002     3
9  2003     4
10 2004     5
13 2002     3
14 2003     4
15 2004     5
19 2003     4
20 2004     5
like image 37
G. Grothendieck Avatar answered Nov 22 '25 11:11

G. Grothendieck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!