Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

passing pandas dataframe into a python subprocess.Popen as an argument

I am attempting to call a python script from a master script. I need the dataframe to be generated only one from within the master script and then passed on to the subprocess script as an argument to be used inside the subprocess.

Following is my attempt at writing the required python master script.

from subprocess import PIPE, Popen
import pandas as pd

test_dataframe = pd.read_excel(r'C:\test_location\file.xlsx',sheetname='Table')

sp = Popen(["python.exe",'C:/capture/test.py'], shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
sp.communicate(test_dataframe)

And here is the error: TypeError: argument 1 must be convertible to a buffer, not DataFrame

This is my first time trying to use the subprocess module so i am not very good at it yet. Any help will be much appreciated.

like image 876
python_enthusiast Avatar asked Aug 03 '17 17:08

python_enthusiast


1 Answers

Subprocess launches another application. The ways that processes may communicate between each other significantly differ from ways that functions communicate within python program. You need to pass your DataFrame through a non pythonic environment. So you need to serialize it in-to a text and then deserialize it on other end. For example you can use pickle module and then sp.communicate(pickle.dumps(test_dataframe)) on one end end pickle.loads(sys.stdin.read()) on another. Or you can write your DataFrame as csv and then parse it again. Or you can use any other format.

like image 98
Alexey Guseynov Avatar answered Sep 19 '22 17:09

Alexey Guseynov