I am getting an error while converting R file into Stata format. I am able to convert the numbers into Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
In Stata, we can use letters . a-. z and underscore “.” to indicate the type of missing values. In the example below, variable female has value -999 indicating that the subject refused to answer the question and value -99 indicating a data entry error.
Stata doesn't attach any special meaning to the string "NA". Necessarily any single string can't capture "not available" "not applicable" "refused to answer" "test-tube dropped on floor" and many more reasons why there might be various kinds of missing or not directly informative string values.
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta
, because Stata is perfectly fine with empty strings. But since foreign
is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta
in the haven
package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With