I am using ffmpeg's extract_mvs file to generate some text information. I would use a command like this in the terminal:
/extract_mvs input.mp4 > output.txt
I would like to use this command with Popen
or other subprocess in python such that instead of output.txt, the data is passed straight to a pandas data frame without actually generating the text file.
The idea is to automate this multiple times, so, I am trying to avoid many .txt files from being generated and thus having to open()
them one by one.
I thought of something like this:
import subprocess
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(a.communicate()[0], sep=',')
But then I get an error: OSError: Expected file path name or file-like object, got <class 'bytes'> type
Can it be fixed and extended so as to read straight from subprocess to pandas?
Using DataFrame. to_string() to Print DataFrame without Index. You can use DataFrame. to_string(index=False) on the DataFrame object to print.
Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table. The data can be in form of list of lists or dictionary of lists.
DataFrame - to_pickle() functionThe to_pickle() function is used to pickle (serialize) object to file. File path where the pickled object will be stored. A string representing the compression to use in the output file. By default, infers from the file extension in specified path.
I found a workaround, using part of the answer of Keith and the one found here, to pass information from string to pandas dataframe.
The final working code is:
import sys
import subprocess
import pandas as pd
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
b = StringIO(a.communicate()[0].decode('utf-8'))
df = pd.read_csv(b, sep=",")
The more I think about your question and the output from the first answer I suggested, the more I think your problem is not a decoding issue and is perhaps more a failure to provide the right input to pd.read_csv()
. As an alternative you could try skipping pd.read_csv()
entirely. Instead, you could try reading the output from the subprocess line by line into a dataframe.
Something like this:
cmd = ['./extract_mvs', 'input.mp4']
df = pd.DataFrame()
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in a.stdout:
df = pd.concat([df, line])
a.wait()
Again, I haven't tested this code myself (because I'm traveling and using my phone right now), but I hope this gets you a little closer to a solution.
I haven't tested this, but I think you just need to decode the results returned by the execution of your subprocess. Specifically, you need to decode your results from bytes
to utf-8
.
You can try:
pd.read_csv(a.communicate()[0].decode('utf-8'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With