Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing pexpect output

I'm trying to parse in real time the output of a program block-buffered, which means that output is not available until the process ends. What I need is just to parse line by line, filter and manage data from the output, as it could run for hours.

I've tried to capture the output with subprocess.Popen(), but yes, as you may guess, Popen can't manage this kind of behavior, it keeps buffering until end of process.

from subprocess import Popen, PIPE

p = Popen("my noisy stuff ", shell=True, stdout=PIPE, stderr=PIPE)
for line in p.stdout.readlines():
    #parsing text and getting data

So I found pexpect, which prints the output in real time, as it treats the stdout as a file, or I could even do a dirty trick printing out a file and parsing it outside the function. But ok, it is too dirty, even for me ;)

import pexpect
import sys

pexpect.run("my noisy stuff", logfile=sys.stdout)

But I guess it should a better pythonic way to do this, just manage the stdout like subprocess. Popen does. How can I do this?

EDIT:

Running J.F. proposal:

This is a deliberately wrong audit, it takes about 25 secs. to stop.

from subprocess import Popen, PIPE

command = "bully mon0 -e ESSID -c 8 -b aa:bb:cc:dd:ee:00 -v 2"

p = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)

for line in iter(p.stdout.readline, b''):
    print "inside loop"
    print line

print "outside loop"
p.stdout.close()
p.wait()


#$ sudo python SCRIPT.py
                                ### <= 25 secs later......
# inside loop
#[!] Bully v1.0-21 - WPS vulnerability assessment utility

#inside loop
#[!] Using 'ee:cc:bb:aa:bb:ee' for the source MAC address

#inside loop
#[X] Unable to get a beacon from the AP, possible causes are

#inside loop
#[.]    an invalid --bssid or -essid was provided,

#inside loop
#[.]    the access point isn't on channel '8',

#inside loop
#[.]    you aren't close enough to the access point.

#outside loop

Using this method instead: EDIT: Due to large delays and timeouts in the output, I had to fix the child, and added some hacks, so final code looks like this

import pexpect

child = pexpect.spawn(command)
child.maxsize = 1  #Turns off buffering
child.timeout = 50 # default is 30, insufficient for me. Crashes were due to this param.
for line in child:
    print line,

child.close()

Gives back the same output, but it prints lines in real time. So... SOLVED Thanks @J.F. Sebastian

like image 328
peluzza Avatar asked Nov 01 '22 10:11

peluzza


1 Answers

.readlines() reads all lines. No wonder you don't see any output until the subprocess ends. You could use .readline() instead to read line by line as soon as the subprocess flushes its stdout buffer:

from subprocess import Popen, PIPE

p = Popen("my noisy stuff", stdout=PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
    # process line
    ..
p.stdout.close()
p.wait()

If you are already have pexpect then you could use it to workaround the block-buffering issue:

import pexpect

child = pexpect.spawn("my noisy stuff", timeout=None)
for line in child: 
    # process line
    ..
child.close()

See also stdbuf, pty -based solutions from the question I've linked in the comments.

like image 189
jfs Avatar answered Nov 09 '22 13:11

jfs