Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Baffling error using dataprep function in R Synth package

Tags:

r

I am trying to use the 'Synth' package in R to explore the effect that certain coups had on economic growth in the countries where they occurred, but I'm hung up on an error I can't understand. When I attempt to run dataprep(), I get the following:

Error in dataprep(foo = World, predictors = c("rgdpe.pc", "population.ln",  : 

 unit.variable not found as numeric variable in foo.

That's puzzling because my data frame, World, does include a numeric id called "idno" as specified in the call to dataprep().

Here is the script I'm using. It ingests a .csv with the requisite data from GitHub. The final step --- the call to dataprep() --- is where the error arises. I would appreciate help in figuring out why this error arises and how to avoid it so I can get on to the synth() part to follow.

library(dplyr)
library(Synth)

# DATA INGESTION AND TRANSFORMATION

World <- read.csv("https://raw.githubusercontent.com/ulfelder/coups-and-growth/master/data.raw.csv", stringsAsFactors=FALSE)

World$rgdpe.pc = World$rgdpe/World$pop # create per capita version of GDP (PPP)
World$idno = as.numeric(as.factor(World$country))  # create numeric country id
World$population.ln = log(World$population/1000)  # population size in 1000s, logged
World$trade.ln = log(World$trade)  # trade as % of GDP, logged
World$civtot.ln = log1p(World$civtot)  # civil conflict scale, +1 and logged
World$durable.ln = log1p(World$durable)  # political stability, +1 and logged
World$polscore = with(World, ifelse(polity >= -10, polity, NA)) # create version of Polity score that's missing for -66, -77, and -88
World <- World %>%  # create clocks counting years since last coup (attempt) or 1950, whichever is most recent
    arrange(countrycode, year) %>%
    mutate(cpt.succ.d = ifelse(cpt.succ.n > 0, 1, 0),
           cpt.any.d = ifelse(cpt.succ.n > 0 | cpt.fail.n > 0, 1, 0)) %>%
    group_by(countrycode, idx = cumsum(cpt.succ.d == 1L)) %>%
    mutate(cpt.succ.clock = row_number()) %>%
    ungroup() %>%
    select(-idx) %>%
    group_by(countrycode, idx = cumsum(cpt.any.d == 1L)) %>%
    mutate(cpt.any.clock = row_number()) %>%
    ungroup() %>%
    select(-idx) %>%
    mutate(cpt.succ.clock.ln = log1p(cpt.succ.clock), # include +1 log versions
           cpt.any.clock.ln = log1p(cpt.any.clock))

# THAILAND 2006

THI.coup.year = 2006

THI.years = seq(THI.coup.year - 5, THI.coup.year + 5)
# Get names of countries that had no coup attempts during window analysis will cover. If you wanted to restrict the comparison to a
# specific region or in any other categorical way, this would be the place to do that as well.
THI.controls <- World %>%
    filter(year >= min(THI.years) & year <= max(THI.years)) %>% # filter to desired years
    group_by(idno) %>%  # organize by country
    summarise(coup.ever = sum(cpt.any.d)) %>%  # get counts by country of years with coup attempts during that period
    filter(coup.ever==0) %>%  # keep only the ones with 0 counts
    select(idno)  # cut down to country names
THI.controls = unlist(THI.controls)  # convert that data frame to a vector
names(THI.controls) = NULL  # strip the vector of names

THI.synth.dat <- dataprep(

    foo = World,

    predictors = c("rgdpe.pc", "population.ln", "trade.ln", "fcf", "govfce", "energy.gni", "polscore", "durable.ln", "cpt.any.clock.ln", "civtot.ln"),
    predictors.op = "mean",
    time.predictors.prior = seq(from = min(THI.years), to = THI.coup.year - 1),

    dependent = "rgdpe.pc",

    unit.variable = "idno",
    unit.names.variable = "country",
    time.variable = "year",

    treatment.identifier = unique(World$idno[World$country=="Thailand"]),
    controls.identifier = THI.controls,

    time.optimize.ssr = seq(from = THI.coup.year, to = max(THI.years)),
    time.plot = THI.years

)
like image 261
ulfelder Avatar asked Aug 29 '15 13:08

ulfelder


1 Answers

Too long for a comment.

Your dplyr statement:

World <- World %>% ...

converts World from a data.frame to a tbl_df object (read the docs on dplyr). Unfortunately, this causes mode(World[,"idno"]) to return list, not numeric and the test for numeric unit.variable fails.

You can fix this by using

`World <- as.data.frame(World)`

just before the call to dataprep(...).

Unfortunately (again) you now get a different error which may be due to the logic of your dplyr statement.

like image 195
jlhoward Avatar answered Oct 21 '22 12:10

jlhoward