I'm wondering if anyone knows a Python package that allows you to save numpy arrays/recarrays in the .dta
format of the statistical data analysis software Stata. This would really speed up a few steps in a system I have.
In Excel, you would choose file then open and then for files of type select comma separated file (Excel expects those files to have a . csv extension). You can then click the file and open it in Excel. You can learn more about this by seeing the Stata help file for outsheet.
Stata data files have . dta extensions. See the “How to Import an Excel or Text Data File into Stata” handout for information on how to import other types of data files into Stata.
The scikits.statsmodels package includes a reader for Stata data files, which relies in part on PyDTA as pointed out by @Sven. In particular, genfromdta()
will return an ndarray
, e.g.
from Python 2.7/statsmodels 0.3.1:
>>> import scikits.statsmodels.api as sm
>>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
>>> type(arr)
<type 'numpy.ndarray'>
The savetxt()
function can be used in turn to save an array as a text file, which can be imported in Stata. For example, we can export the above as
>>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")
and read it in Stata without a dictionary file as follows:
. insheet using auto.txt, clear
I believe a *.dta
reader should be added in the near future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With