Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.
You could use a python interface to R and then import the data using read. spss in library(foreign) .
How to Read an SPSS file in Python Using Pandas. Now, when we have done that, we can read the . sav file into a Pandas dataframe using the read_spss method. In the read SPSS example below, we read the same data file as earlier and print the 5 last rows of the dataframe using Pandas tail method.
In addition to files saved in IBM® SPSS® Statistics format, you can open Excel, SAS, Stata, tab-delimited, and other files without converting the files to an intermediate format or entering data definition information. Opening a data file makes it the active dataset.
I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.
The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.
For example, in order to read a SPSS sav file you would do:
import pyreadstat df, meta = pyreadstat.read_sav("/path/to/sav/file.sav")
df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.
You can find it here: https://github.com/Roche/pyreadstat
Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs.
Otherwise, Pandas includes a convenient wrapper for rpy2
. Here is an example of use with Peat and Barton's weights.sav
data set:
>>> import pandas.rpy.common as com >>> filename = "weights.sav" >>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename) >>> w = com.convert_robj(w) >>> w.head() ID WEIGHT LENGTH HEADC GENDER EDUCATIO PARITY 1 L001 3.95 55.5 37.5 Female tertiary 3 or more siblings 2 L003 4.63 57.0 38.5 Female tertiary Singleton 3 L004 4.75 56.0 38.5 Male year12 2 siblings 4 L005 3.92 56.0 39.0 Male tertiary One sibling 5 L006 4.56 55.0 39.5 Male year10 2 siblings
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With