Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Stata do file from Python

Tags:

python

stata

I have a Python script that cleans up and performs basic statistical calculations on a large panel dataset (2,000,000+ observations).

I find that some of these tasks are better suited to Stata, and wrote a do file with the necessary commands. Thus, I want to run a .do file within my Python code. How would I go about calling a .do file from Python?

like image 381
svenkatesh Avatar asked Jan 21 '14 16:01

svenkatesh


People also ask

Can you run Stata code in Python?

The pystata Python package allows you to call Stata from within Python. It includes two sets of tools for interacting with Stata from within Python: Three IPython magic commands. A suite of API functions.

How do you run do file in Stata?

To execute all the commands in your do file sequentially in Stata, press the “Execute (do)” icon, located in the toolbar of the Do-file Editor window. Alternatively, you can click on Tools in the Do-file Editor window, then on Execute (do).

How do I open a do file in Python?

Opening a file in Python This can be done using the open() function. This function returns a file object and takes two arguments, one that accepts the file name and another that accepts the mode(Access Mode).

How do you run Stata do file in Terminal?

Simply start Stata, type log using filename, and type do filename. You can then watch the do-file run, or you can minimize Stata while the do-file is running.


1 Answers

I think @user229552 points in the correct direction. Python's subprocess module can be used. Below an example that works for me with Linux OS.

Suppose you have a Python file called pydo.py with the following:

import subprocess

## Do some processing in Python

## Set do-file information
dofile = "/home/roberto/Desktop/pyexample3.do"
cmd = ["stata", "do", dofile, "mpg", "weight", "foreign"]

## Run do-file
subprocess.call(cmd) 

and a Stata do-file named pyexample3.do, with the following:

clear all
set more off

local y `1'
local x1 `2'
local x2 `3'

display `"first parameter: `y'"'
display `"second parameter: `x1'"'
display `"third parameter: `x2'"'

sysuse auto
regress `y' `x1' `x2'

exit, STATA clear

Then executing pydo.py in a Terminal window works as expected.

You could also define a Python function and use that:

## Define a Python function to launch a do-file 
def dostata(dofile, *params):
    ## Launch a do-file, given the fullpath to the do-file
    ## and a list of parameters.
    import subprocess    
    cmd = ["stata", "do", dofile]
    for param in params:
        cmd.append(param)
    return subprocess.call(cmd) 

## Do some processing in Python

## Run a do-file
dostata("/home/roberto/Desktop/pyexample3.do", "mpg", "weight", "foreign")

The complete call from a Terminal, with results:

roberto@roberto-mint ~/Desktop
$ python pydo.py

  ___  ____  ____  ____  ____ (R)
 /__    /   ____/   /   ____/
___/   /   /___/   /   /___/   12.1   Copyright 1985-2011 StataCorp LP
  Statistics/Data Analysis            StataCorp
                                      4905 Lakeway Drive
                                      College Station, Texas 77845 USA
                                      800-STATA-PC        http://www.stata.com
                                      979-696-4600        [email protected]
                                      979-696-4601 (fax)


Notes:
      1.  Command line editing enabled

. do /home/roberto/Desktop/pyexample3.do mpg weight foreign 

. clear all

. set more off

. 
. local y `1'

. local x1 `2'

. local x2 `3'

. 
. display `"first parameter: `y'"'
first parameter: mpg

. display `"second parameter: `x1'"'
second parameter: weight

. display `"third parameter: `x2'"'
third parameter: foreign

. 
. sysuse auto
(1978 Automobile Data)

. regress `y' `x1' `x2'

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  2,    71) =   69.75
       Model |   1619.2877     2  809.643849           Prob > F      =  0.0000
    Residual |  824.171761    71   11.608053           R-squared     =  0.6627
-------------+------------------------------           Adj R-squared =  0.6532
       Total |  2443.45946    73  33.4720474           Root MSE      =  3.4071

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0065879   .0006371   -10.34   0.000    -.0078583   -.0053175
     foreign |  -1.650029   1.075994    -1.53   0.130      -3.7955    .4954422
       _cons |    41.6797   2.165547    19.25   0.000     37.36172    45.99768
------------------------------------------------------------------------------

. 
. exit, STATA clear

Sources:

http://www.reddmetrics.com/2011/07/15/calling-stata-from-python.html

http://docs.python.org/2/library/subprocess.html

http://www.stata.com/support/faqs/unix/batch-mode/

A different route for using Python and Stata together can be found at

http://ideas.repec.org/c/boc/bocode/s457688.html

http://www.stata.com/statalist/archive/2013-08/msg01304.html

like image 123
Roberto Ferrer Avatar answered Oct 06 '22 01:10

Roberto Ferrer