Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.sas7bdat unable to read compressed file

Tags:

r

sas

I am trying to read a .sas7bdat file in R. When I use the command

library(sas7bdat)
read.sas7bdat("filename")

I get the following error:

Error in read.sas7bdat("county2.sas7bdat") : file contains compressed data

I do not have experience with SAS, so any help will be highly appreciated.

Thanks!

like image 642
user3641630 Avatar asked Jun 16 '14 15:06

user3641630


People also ask

How do I read SAS7BDAT files?

To read in a SAS dataset just use a set statement. You could first use a LIBNAME statement define a libref that points to the folder with the SAS dataset in it and use that libref in your code. libname learn '/home/XXX/learn/'; data Sales; set learn.

What is SAS7BDAT file?

The SAS7BDAT file is a binary database storage file. At the time of this writing, no description of the SAS7BDAT file format was publicly available. Hence, users who wish to read and manipulate these files were required to obtain a license for the SAS software, or third party software with support for SAS7BDAT files.

How do you save a SAS7BDAT file?

Re: how to export dataset to a sas7bdat file If you want a SAS7BDAT file you place it into that library and the file is created. If you want a text file you use a PROC EXPORT and ensure you have the path correct. "Export"/Save to a SAS7BDAT file to the myfolders folder.


2 Answers

According to the sas7bdat vignette [vignette('sas7bdat')], COMPRESS=BINARY (or COMPRESS=YES) is not currently supported as of 2013 (and this was the vignette active on 6/16/2014 when I wrote this). COMPRESS=CHAR is supported.

These are basically internal compression routines, intended to make filesizes smaller. They're not as good as gz or similar (not nearly as good), but they're supported by SAS transparently while writing SAS programs. Obviously they change the file format significantly, hence the lack of implementation yet.

If you have SAS, you need to write these to an uncompressed dataset.

options compress=no;
libname lib '//drive/path/to/files';
data lib.want;
set lib.have;
run;

That's the simplest way (of many), assuming you have a libname defined as lib as above and change have and want to names that are correct (have should be the filename without extension of the file, in most cases; want can be changed to anything logical with A-Z or underscore only, and 32 or fewer characters).

If you don't have SAS, you'll have to ask your data provided to make the data available uncompressed, or as a different format. If you're getting this from a PUDS somewhere on the web, you might post where you're getting it from and there might be a way to help you identify an uncompressed source.

like image 155
Joe Avatar answered Sep 22 '22 09:09

Joe


This admittedly is not a pure R solution, but in many situations (e.g. if you aren't on a pc and don't have the ability to write the SAS file yourself) the other solutions posted are not workable.

Fortunately, Python has a module (https://pypi.python.org/pypi/sas7bdat) which supports reading compressed SAS data sets - it's certainly better using this than needing to acquire SAS if you don't already have it. Once you extract the file and save it to text via Python, you can then access it in R.

from sas7bdat import SAS7BDAT
import pandas as pd

InFileName = "myfile.sas7bdat"
OutFileName = "myfile.txt"

with SAS7BDAT(InFileName) as f:
    df = f.to_data_frame()

df.to_csv(path_or_buf = OutFileName, sep = "\t", encoding = 'utf-8', index = False)
like image 22
Michael Ohlrogge Avatar answered Sep 22 '22 09:09

Michael Ohlrogge