Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Python module to open SPSS files?

Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.

like image 783
Lamps1829 Avatar asked Feb 01 '13 13:02

Lamps1829


People also ask

Can you open SPSS files in Python?

You could use a python interface to R and then import the data using read. spss in library(foreign) .

Can Python read SPSS sav file?

How to Read an SPSS file in Python Using Pandas. Now, when we have done that, we can read the . sav file into a Pandas dataframe using the read_spss method. In the read SPSS example below, we read the same data file as earlier and print the 5 last rows of the dataframe using Pandas tail method.

What program opens SPSS file?

In addition to files saved in IBM® SPSS® Statistics format, you can open Excel, SAS, Stata, tab-delimited, and other files without converting the files to an intermediate format or entering data definition information. Opening a data file makes it the active dataset.


2 Answers

I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.

The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.

For example, in order to read a SPSS sav file you would do:

import pyreadstat  df, meta = pyreadstat.read_sav("/path/to/sav/file.sav") 

df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.

You can find it here: https://github.com/Roche/pyreadstat

like image 80
Otto Fajardo Avatar answered Sep 22 '22 07:09

Otto Fajardo


Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs.

Otherwise, Pandas includes a convenient wrapper for rpy2. Here is an example of use with Peat and Barton's weights.sav data set:

>>> import pandas.rpy.common as com >>> filename = "weights.sav" >>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename) >>> w = com.convert_robj(w) >>> w.head()      ID  WEIGHT  LENGTH  HEADC  GENDER  EDUCATIO              PARITY 1  L001    3.95    55.5   37.5  Female  tertiary  3 or more siblings 2  L003    4.63    57.0   38.5  Female  tertiary           Singleton 3  L004    4.75    56.0   38.5    Male    year12          2 siblings 4  L005    3.92    56.0   39.0    Male  tertiary         One sibling 5  L006    4.56    55.0   39.5    Male    year10          2 siblings 
like image 43
chl Avatar answered Sep 21 '22 07:09

chl