Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting R file to Stata with missing string values

Tags:

r

stata

I am getting an error while converting R file into Stata format. I am able to convert the numbers into Stata file but when I include strings I get the following error:

library(foreign)
write.dta(newdata, "X.dta")

Error in write.dta(newdata, "X.dta") : 
  empty string is not valid in Stata's documented format

I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .

like image 350
user3570187 Avatar asked Dec 19 '14 21:12

user3570187


People also ask

How to declare values as missing in Stata?

In Stata, we can use letters . a-. z and underscore “.” to indicate the type of missing values. In the example below, variable female has value -999 indicating that the subject refused to answer the question and value -99 indicating a data entry error.

What is Na in Stata?

Stata doesn't attach any special meaning to the string "NA". Necessarily any single string can't capture "not available" "not applicable" "refused to answer" "test-tube dropped on floor" and many more reasons why there might be various kinds of missing or not directly informative string values.


1 Answers

I've had this error many times before, and it's easy to reproduce:

library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')

One solution is to use factor variables instead of character variables, e.g.,

for (colname in names(test)) {
  if (is.character(test[[colname]])) {
    test[[colname]] <- as.factor(test[[colname]])
  }
}

Another is to change the empty strings to something else and change them back in Stata.

This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.

Update: (2015-12-04) A better solution is to use write_dta in the haven package:

library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')

This way, Stata reads string variables properly as strings.

like image 107
Frank Avatar answered Oct 05 '22 19:10

Frank