Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I Difference Panel Data in R

I am wondering if there is any easy R commands or packages that will all allow me to easily add variables to data.frames which are the "difference" or change of over time of those variables.

If my data looked like this:

set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
MyData

   Day Price    Good
1    1     8  apples
2    2    10  apples
3    3     7  apples
4    4    11  apples
5    5    14  apples
6    1    12 oranges
7    2    11 oranges
8    3     9 oranges
9    4    14 oranges
10   5    11 oranges

Then after "first differencing" the price variable, my data would look like this.

   Day Price    Good P1d
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3
like image 449
Francis Smart Avatar asked Mar 21 '14 11:03

Francis Smart


People also ask

How do you analyze panel data?

Panel (data) analysis is a statistical method, widely used in social science, epidemiology, and econometrics to analyze two-dimensional (typically cross sectional and longitudinal) panel data. The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions.

How do you take first difference in R?

A simple way to view a single (or "first order") difference is to see it as x(t) - x(t-k) where k is the number of lags to go back. Higher order differences are simply the reapplication of a difference to each prior result. In R, the difference operator for xts is made available using the diff() command.

How do you deal with unbalanced panel data?

An unbalanced-panel is a dataset in which one panel member is not observed every period. To fix it, Run standard fixed effects models on your entire unbalanced data and get estimates.

What is first difference in panel data?

The first-differenced (FD) estimator is an approach that is used to address the problem of omitted variables in econometrics and statistics by using panel data.


2 Answers

ave

transform(MyData, P1d = ave(Price, Good, FUN = function(x) c(NA, diff(x))))

ave/gsubfn

The last solution can be shorteneed slightly using fn$ in the gsubfn package:

library(gsubfn)
transform(MyData, P1d = fn$ave(Price, Good, FUN = ~ c(NA, diff(x))))

dplyr

library(dplyr)

MyData %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup

data.table

library(data.table)

dt <- data.table(MyData)
dt[, P1d := c(NA, diff(Price)), by = Good]

Update

dplyr now uses %>% instead of %.% .

like image 124
G. Grothendieck Avatar answered Sep 29 '22 16:09

G. Grothendieck


One can easily do it like this:

library(reshape2)
library(dplyr)

MyNewData <- 
 MyData %.%
 melt(id = c("Good", "Day")) %.%
 dcast(Day ~ Good) %.%
 mutate(apples  = apples - lag(apples),
     oranges = oranges - lag(oranges)) %.%
 melt(id = "Day", variable.name = "Good", value.name = "P1d") %.%
 merge(MyData) %.%
 arrange(Good, Day)

Regards

like image 26
Miha Trošt Avatar answered Sep 29 '22 16:09

Miha Trošt