Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python subprocess communicate() yields None, when list of number is expected

When I run the following code

from subprocess import call, check_output, Popen, PIPE

gr = Popen(["grep", "'^>'", myfile], stdout=PIPE)
sd = Popen(["sed", "s/.*len=//"], stdin=gr.stdout)
gr.stdout.close()
out = sd.communicate()[0]
print out

Where myfile looks like this:

>name len=345
sometexthere
>name2 len=4523
someothertexthere
...
...

I get

None

When the expected output is a list of numbers:

345
4523
...
...

The corresponding command I run in the terminal is

grep "^>" myfile | sed "s/.*len=//" > outfile

So far, I have tried playing around with escaping and quoting in different ways, such as escaping slashes in the sed or adding extra quotation marks for grep, but the combinatorial possibilities there are large.

I have also considered just reading in the file and writing Python equivalents of grep and sed, but the file is very large (I could always read line by line though), it will always run on UNIX-based systems and I am still curious on where I made errors.

Could it be that

sd.communicate()[0]

returns some kind of object (instead of the list of integers) for which None is the type?

I know I can grab the output with check_output in simple cases:

sam = check_output(["samn", "stats", myfile])

but not sure how to make it work with more complicated situations were stuff is getting piped.

What are some productive approaches to get the expected results with subprocess?

like image 280
EKarl Avatar asked Dec 24 '15 22:12

EKarl


1 Answers

As suggested you need to stdout=PIPE in the second process and remove the single quotes from "'^>'":

gr = Popen(["grep", "^>", myfile], stdout=PIPE)
Popen(["sed", "s/.*len=//"], stdin=gr.stdout, stdout=PIPE)
......

But this can be done simply using pure python and re:

import re
r = re.compile("^\>.*len=(.*)$")
with open("test.txt") as f:
    for line in f:
        m =  r.search(line)
        if m:
            print(m.group(1))

Which would output:

345
4523

If the lines that start with > always have the number and the number is always at the end after len= then you don't actually need a regex either:

with open("test.txt") as f:
    for line in f:
        if line.startswith(">"):
            print(line.rsplit("len=", 1)[1])
like image 75
Padraic Cunningham Avatar answered Nov 15 '22 09:11

Padraic Cunningham