Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.

I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust. The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries. For example, in order to read a SPSS sav file you would do: <pre class="prettyprint"><code>import pyreadstat df, meta = pyreadstat.read_sav("/path/to/sav/file.sav") </code></pre> df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files. You can find it here: https://github.com/Roche/pyreadstat

Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs. Otherwise, Pandas includes a convenient wrapper for <code>rpy2</code>. Here is an example of use with Peat and Barton's <code>weights.sav</code> data set: <pre class="prettyprint"><code>>>> import pandas.rpy.common as com >>> filename = "weights.sav" >>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename) >>> w = com.convert_robj(w) >>> w.head() ID WEIGHT LENGTH HEADC GENDER EDUCATIO PARITY 1 L001 3.95 55.5 37.5 Female tertiary 3 or more siblings 2 L003 4.63 57.0 38.5 Female tertiary Singleton 3 L004 4.75 56.0 38.5 Male year12 2 siblings 4 L005 3.92 56.0 39.0 Male tertiary One sibling 5 L006 4.56 55.0 39.5 Male year10 2 siblings </code></pre>

Is there a Python module to open SPSS files?

2 Answers

I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.

The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.

For example, in order to read a SPSS sav file you would do:

import pyreadstat  df, meta = pyreadstat.read_sav("/path/to/sav/file.sav")

df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.

You can find it here: https://github.com/Roche/pyreadstat

answered Sep 22 '22 07:09

Otto Fajardo

Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs.

Otherwise, Pandas includes a convenient wrapper for rpy2. Here is an example of use with Peat and Barton's weights.sav data set:

>>> import pandas.rpy.common as com >>> filename = "weights.sav" >>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename) >>> w = com.convert_robj(w) >>> w.head()      ID  WEIGHT  LENGTH  HEADC  GENDER  EDUCATIO              PARITY 1  L001    3.95    55.5   37.5  Female  tertiary  3 or more siblings 2  L003    4.63    57.0   38.5  Female  tertiary           Singleton 3  L004    4.75    56.0   38.5    Male    year12          2 siblings 4  L005    3.92    56.0   39.0    Male  tertiary         One sibling 5  L006    4.56    55.0   39.5    Male    year10          2 siblings

answered Sep 21 '22 07:09

chl

Related questions
                            
                                After installing anaconda - command not found: jupyter
                            
                                Determining the most contributing features for SVM classifier in sklearn
                            
                                Illegal instruction (core dumped) after running import tensorflow
                            
                                Python module to shellquote/unshellquote? [duplicate]
                            
                                You are not allowed to edit '...' package information
                            
                                Print a float number in normal form, not exponential form / scientific notation [duplicate]
                            
                                How to configure Logging in Python
                            
                                Saving Image with PIL
                            
                                Python writelines() and write() huge time difference
                            
                                Running subprocess within different virtualenv with python
                            
                                passing data to subprocess.check_output
                            
                                matplotlib savefig() size control
                            
                                How to install sklearn? [closed]
                            
                                scikit-learn return value of LogisticRegression.predict_proba
                            
                                How to remove decimal points in pandas
                            
                                Python - How NOT to sort Sphinx output in alphabetical order
                            
                                How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?
                            
                                How can I use f-string with a variable, not with a string literal?
                            
                                Short Python alphanumeric hash with minimal collisions
                            
                                How to reset cursor to the beginning of the same line in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a Python module to open SPSS files?

Tags:

python

python-module

statistics

dataset

spss

Lamps1829

People also ask

2 Answers

Otto Fajardo

chl

Recent Activity

Donate For Us