I'm not understanding how to create a new 'lagged' column in a data.frame. My current data is collected at the end of the data. One program I need to send this to assumes it's collected first thing in the morning, so I need to lag column 2 by 1 row. The code I wrote just returns the same data.
How can I do this properly?
Thanks.
D8 = structure(list(Date = structure(c(14396, 14397, 14398, 14399,
14400, 14403, 14404, 14405, 14406,
14407, 14410, 14411, 14412, 14413,
14414, 14417, 14418, 14419, 14420,
14421, 14424, 14425, 14426, 14427,
14428, 14431, 14432, 14433, 14434,
14435), class = "Date"),
PL8 = c(0, 0, 0, 0, 76, 0, -334, -974, -104, 356, 378, -1102,
-434, 266, -434, 444, 464, 0, 486, 406, -224, -214, 0, -4,
0, -188, 356, 322, -484, 436)), .Names = c("Date", "PL8"), row.names =
c(NA, 30L), class = "data.frame")
D8
D8[,3] = lag(D8[,2],k=-1)
D8
You can use the shift() function in pandas to create a column that displays the lagged values of another column. Note that the value in the shift() function indicates the number of values to calculate the lag for.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
Try this:
transform(D8, PL8.lag = c(PL8[-1], NA))
It would be a bit easier if you used a time series class. In that case you could use lag
:
library(zoo)
z <- read.zoo(D8)
lag(z, 0:1)
In the other direction we would have:
transform(D8, PL8.lag = c(NA, head(PL8, -1)))
and
lag(z, 0:-1)
Here is an alternative solution:
D8$my.PL8.lag <- c(D8$PL8[-1], NA)
Here is code to lag a column by group using tapply
:
my.df = read.table(text = "
REFNO MONTH DAY YEAR STATE
1 3 5 2012 1
1 3 7 2012 2
1 3 10 2012 3
1 3 14 2012 NA
2 3 1 2012 20
2 3 10 2012 40
2 3 14 2012 60
2 3 17 2012 80
3 4 3 2012 -4
3 4 24 2012 -8
3 4 28 2012 -12
", header = TRUE, stringsAsFactors = FALSE)
desired.result = read.table(text = "
REFNO MONTH DAY YEAR STATE STATE.lag
1 3 5 2012 1 NA
1 3 7 2012 2 1
1 3 10 2012 3 2
1 3 14 2012 NA 3
2 3 1 2012 20 NA
2 3 10 2012 40 20
2 3 14 2012 60 40
2 3 17 2012 80 60
3 4 3 2012 -4 NA
3 4 24 2012 -8 -4
3 4 28 2012 -12 -8
", header = TRUE, stringsAsFactors = FALSE)
my.df$STATE.lag <- unlist(tapply(my.df$STATE, my.df$REFNO, function(x) {
c(NA, x[-length(x)])
}))
all.equal(my.df, desired.result)
# [1] TRUE
If the column you wish to lag is in Date
format you can use:
my.df$MY.DATE <- do.call(paste, list(my.df$MONTH, my.df$DAY, my.df$YEAR))
my.df$MY.DATE <- as.Date(my.df$MY.DATE, format=c("%m %d %Y"))
my.df$MY.DATE.lag <- as.Date(unlist(tapply(as.character(my.df$MY.DATE), my.df$REFNO,
function(x) { c(NA, x[-length(x)]) } )))
REFNO MONTH DAY YEAR STATE MY.DATE MY.DATE.lag
1 1 3 5 2012 1 2012-03-05 <NA>
2 1 3 7 2012 2 2012-03-07 2012-03-05
3 1 3 10 2012 3 2012-03-10 2012-03-07
4 1 3 14 2012 NA 2012-03-14 2012-03-10
5 2 3 1 2012 20 2012-03-01 <NA>
6 2 3 10 2012 40 2012-03-10 2012-03-01
7 2 3 14 2012 60 2012-03-14 2012-03-10
8 2 3 17 2012 80 2012-03-17 2012-03-14
9 3 4 3 2012 -4 2012-04-03 <NA>
10 3 4 24 2012 -8 2012-04-24 2012-04-03
11 3 4 28 2012 -12 2012-04-28 2012-04-24
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With