Capturing terminal output into pandas dataframe without creating external text file

Tags:

I am using ffmpeg's extract_mvs file to generate some text information. I would use a command like this in the terminal:

/extract_mvs input.mp4 > output.txt

I would like to use this command with Popen or other subprocess in python such that instead of output.txt, the data is passed straight to a pandas data frame without actually generating the text file.

The idea is to automate this multiple times, so, I am trying to avoid many .txt files from being generated and thus having to open() them one by one.

I thought of something like this:

import subprocess
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(a.communicate()[0], sep=',')

But then I get an error: OSError: Expected file path name or file-like object, got <class 'bytes'> type

Can it be fixed and extended so as to read straight from subprocess to pandas?

266

asked Feb 23 '18 11:02

tavalendo

2 Answers

I found a workaround, using part of the answer of Keith and the one found here, to pass information from string to pandas dataframe.

The final working code is:

import sys
import subprocess
import pandas as pd

cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")

177

answered Sep 28 '22 02:09

tavalendo

Updated answer:

The more I think about your question and the output from the first answer I suggested, the more I think your problem is not a decoding issue and is perhaps more a failure to provide the right input to pd.read_csv(). As an alternative you could try skipping pd.read_csv() entirely. Instead, you could try reading the output from the subprocess line by line into a dataframe.

Something like this:

cmd = ['./extract_mvs', 'input.mp4']

df = pd.DataFrame()

a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for line in a.stdout:
    df = pd.concat([df, line])

a.wait()

Again, I haven't tested this code myself (because I'm traveling and using my phone right now), but I hope this gets you a little closer to a solution.

Original answer:

I haven't tested this, but I think you just need to decode the results returned by the execution of your subprocess. Specifically, you need to decode your results from bytes to utf-8.

You can try: pd.read_csv(a.communicate()[0].decode('utf-8'))

answered Sep 28 '22 02:09

Keith Dowd

Related questions
                            
                                Django application memory usage
                            
                                Does the base for logarithmic calculations in Python influence the speed?
                            
                                Lifetime of object in lambda connected to pyqtSignal
                            
                                lazy processpoolexecutor in Python?
                            
                                How do i check for Cycles/back edges in dictionaries? {...}
                            
                                How can I get mode(s) of pandas dataframe object values?
                            
                                pandas grouper issue with key that is an index
                            
                                How to use GLPK with cvxopt in Windows 10?
                            
                                gensim - Word2vec continue training on existing model - AttributeError: 'Word2Vec' object has no attribute 'compute_loss'
                            
                                Cropping faces from an image using OpenCV in Python
                            
                                Pandas read_json() fails with a simple JSON string
                            
                                Django: Can I use objects.filter() for generic foreignkey?
                            
                                Extracting a list of items from tkinter.Listbox
                            
                                Slicing tensors in tensorflow using argmax
                            
                                Recursive function using lambda's, why does this not work?
                            
                                Simple DataTables flask
                            
                                How to replace values using list comprehension in python3?
                            
                                How to upload and save large data to Google Colaboratory from local drive?
                            
                                Create a generator that yields values from any number of inner generators
                            
                                Seaborn sns.set() changing plot background color

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Capturing terminal output into pandas dataframe without creating external text file

Tags:

python

terminal

pandas

ffmpeg

popen

tavalendo

People also ask

2 Answers

tavalendo

Updated answer:

Original answer:

Keith Dowd

Recent Activity

Donate For Us