Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting large SAS dataset to hdf5

I have multiple large (>10GB) SAS datasets that I want to convert for use in pandas, preferably in HDF5. There are many different data types (dates, numerical, text) and some numerical fields also have different error codes for missing values (i.e. values can be ., .E, .C, etc.) I'm hoping to keep the column names and label metadata as well. Has anyone found an efficient way to do this?

I tried using MySQL as a bridge between the two, but I got some Out of range errors when transferring, plus it was incredibly slow. I also tried export from SAS in Stata .dta format, but SAS (9.3) exports in an old Stata format that is not compatible with read_stat() in pandas. I also tried the sas7bdat package, but from the description it has not been widely tested so I'd like to load the datasets another way and compare the results to make sure everything is working properly.

Extra details: the datasets I'm looking to convert are those from CRSP, Compustat, IBES and TFN from WRDS.

like image 298
vgregoire Avatar asked Feb 10 '14 01:02

vgregoire


People also ask

Can Python read SAS datasets?

In addition, Python provides useful modules to enable users to access and handle SAS datasets and utilize SAS modules from Python via SASPy modules (Nakajima 2018).

Can pandas read SAS7BDAT?

Pandas can read two file formats from SAS – SAS xports ( . XPT ) and SAS data files ( . sas7bdat ). The chunksize and iterator arguments help in reading the SAS file in groups of the same size.

How do I export from SAS7BDAT?

Re: how to export dataset to a sas7bdat file If you want a SAS7BDAT file you place it into that library and the file is created. If you want a text file you use a PROC EXPORT and ensure you have the path correct. "Export"/Save to a SAS7BDAT file to the myfolders folder.


1 Answers

I haven't had much luck with this in the past. We (where I work) just use Tab separated files for transport between SAS and Python -- and we do it a lot.

That said, if you are on Windows, you can attempt to setup an ODBC connection and write the file that way.

like image 166
DomPazz Avatar answered Oct 19 '22 16:10

DomPazz