I recently download some data in ASCII format that came with SAS setup files which I would like to use with R. One such data file is here:
https://dl.dropboxusercontent.com/u/8474088/Data.txt
with corresponding SAS setup file here:
https://dl.dropboxusercontent.com/u/8474088/Setup.sas
I should note that the setup file is designed to work with around 50 different data files all with similar structure (the link above is an example of one of these).
I thought I was in good shape after finding the SAScii package but have been unable to get read.SAScii or parse.SAScii to work with these files. Either command gives an error.
read.SAScii(data.file,setup.file,beginline=581)
Error in if (as.numeric(x[j, "start"]) > as.numeric(x[j - 1, "end"]) + :
missing value where TRUE/FALSE needed
In addition: Warning message:
NAs introduced by coercion
parse.SAScii(setup.file,beginline=581)
Error in if (as.numeric(x[j, "start"]) > as.numeric(x[j - 1, "end"]) + :
missing value where TRUE/FALSE needed
In addition: Warning message:
NAs introduced by coercion
The examples given in the SAScii documentation use much simpler setup files so I am wondering if the complexity of the above file is causing the issue (for example the information on VALUE listed in the file prior to the INPUT command).
Any thoughts on how to proceed would be great. Thanks in advance.
as noted in the details section of the parse.SAScii help, this package cannot read overlapping columns.. and your file clearly has 'em. ;) in order for SAScii to work, you'll have to break the .sas
file into four separate .sas
files on your hard drive. here's how-
# load all necessary libraries
library(stringr)
library(SAScii)
library(downloader)
# create two temporary files
tf <- tempfile()
tf2 <- tempfile()
# download the sas import script
download( "https://dl.dropboxusercontent.com/u/8474088/Setup.sas" , tf )
# download the actual data file
download( "https://dl.dropboxusercontent.com/u/8474088/Data.txt" , tf2 )
# read the sas importation instructions into R
z <- readLines( tf )
# here are the break points
z[ substr( str_trim( z ) , 1 , 1 ) == '#' ]
sas.script.breakpoints <- which( substr( str_trim( z ) , 1 , 1 ) == '#' )
script.one <- z[ 581:sas.script.breakpoints[1] ]
script.two <- z[ sas.script.breakpoints[1]:sas.script.breakpoints[2] ]
script.three <- z[ sas.script.breakpoints[2]:sas.script.breakpoints[3] ]
script.four <- z[ sas.script.breakpoints[3]:length(z) ]
# replace some stuff so these look like recognizable sas scripts
script.one[ length( script.one ) ] <- ";"
script.two[ 1 ] <- "input blank 1-300"
script.two[ length( script.two ) ] <- ";"
script.three[ 1 ] <- "input blank 1-300"
script.three[ length( script.three ) ] <- ";"
script.four[ 1 ] <- "input blank 1-300"
# test then import data set one
writeLines( script.one , tf )
parse.SAScii( tf )
x1 <- read.SAScii( tf2 , tf )
# test then import data set two
writeLines( script.two , tf )
parse.SAScii( tf )
x2 <- read.SAScii( tf2 , tf )
# test then import data set one
writeLines( script.three , tf )
parse.SAScii( tf )
x3 <- read.SAScii( tf2 , tf )
# test then import data set four
writeLines( script.four , tf )
parse.SAScii( tf )
x4 <- read.SAScii( tf2 , tf )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With