Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capturing terminal output into pandas dataframe without creating external text file

I am using ffmpeg's extract_mvs file to generate some text information. I would use a command like this in the terminal:

/extract_mvs input.mp4 > output.txt

I would like to use this command with Popen or other subprocess in python such that instead of output.txt, the data is passed straight to a pandas data frame without actually generating the text file.

The idea is to automate this multiple times, so, I am trying to avoid many .txt files from being generated and thus having to open() them one by one.

I thought of something like this:

import subprocess
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(a.communicate()[0], sep=',')

But then I get an error: OSError: Expected file path name or file-like object, got <class 'bytes'> type

Can it be fixed and extended so as to read straight from subprocess to pandas?

like image 266
tavalendo Avatar asked Feb 23 '18 11:02

tavalendo


People also ask

How can I print a DataFrame without indexing?

Using DataFrame. to_string() to Print DataFrame without Index. You can use DataFrame. to_string(index=False) on the DataFrame object to print.

How do you make a DataFrame from output in Python?

Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table. The data can be in form of list of lists or dictionary of lists.

Can you pickle DataFrame?

DataFrame - to_pickle() functionThe to_pickle() function is used to pickle (serialize) object to file. File path where the pickled object will be stored. A string representing the compression to use in the output file. By default, infers from the file extension in specified path.


2 Answers

I found a workaround, using part of the answer of Keith and the one found here, to pass information from string to pandas dataframe.

The final working code is:

import sys
import subprocess
import pandas as pd

cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")
like image 177
tavalendo Avatar answered Sep 28 '22 02:09

tavalendo


Updated answer:

The more I think about your question and the output from the first answer I suggested, the more I think your problem is not a decoding issue and is perhaps more a failure to provide the right input to pd.read_csv(). As an alternative you could try skipping pd.read_csv() entirely. Instead, you could try reading the output from the subprocess line by line into a dataframe.

Something like this:

cmd = ['./extract_mvs', 'input.mp4']

df = pd.DataFrame()

a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for line in a.stdout:
    df = pd.concat([df, line])

a.wait()

Again, I haven't tested this code myself (because I'm traveling and using my phone right now), but I hope this gets you a little closer to a solution.

Original answer:

I haven't tested this, but I think you just need to decode the results returned by the execution of your subprocess. Specifically, you need to decode your results from bytes to utf-8.

You can try: pd.read_csv(a.communicate()[0].decode('utf-8'))

like image 27
Keith Dowd Avatar answered Sep 28 '22 02:09

Keith Dowd