Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python equivalent of find2perl

Perl has a lovely little utility called find2perl that will translate (quite faithfully) a command line for the Unix find utility into a Perl script to do the same.

If you have a find command like this:

find /usr -xdev -type d -name '*share'

                         ^^^^^^^^^^^^  => name with shell expansion of '*share'
                 ^^^^ => Directory (not a file)
           ^^^ => Do not go to external file systems
     ^^^ => the /usr directory (could be multiple directories

It finds all the directories ending in share below /usr

Now run find2perl /usr -xdev -type d -name '*share' and it will emit a Perl script to do the same. You can then modify the script to your use.

Python has os.walk() which certainly has the needed functionality, recursive directory listing, but there are big differences.

Take the simple case of find . -type f -print to find and print all files under the current directory. A naïve implementation using os.walk() would be:

for path, dirs, files in os.walk(root):
    if files:
        for file in files:
            print os.path.join(path,file)

However, this will produce different results than typing find . -type f -print in the shell.

I have also been testing various os.walk() loops against:

# create pipe to 'find' with the commands with arg of 'root'
find_cmd='find %s -type f' % root
args=shlex.split(find_cmd)
p=subprocess.Popen(args,stdout=subprocess.PIPE)
out,err=p.communicate()    
out=out.rstrip()            # remove terminating \n
for line in out.splitlines()
   print line

The difference is that os.walk() counts links as files; find skips these.

So a correct implementation that is the same as file . -type f -print becomes:

for path, dirs, files in os.walk(root):
    if files:
        for file in files:
            p=os.path.join(path,file)
            if os.path.isfile(p) and not os.path.islink(p):
                 print(p)

Since there are hundreds of permutations of find primaries and different side effects, this becomes time consuming to test every variant. Since find is the gold standard in the POSIX world on how to count files in a tree, doing it the same way in Python is important to me.

So is there an equivalent of find2perl that can be used for Python? So far I have just been using find2perl and then manually translating the Perl code. This is hard because the Perl file test operators are different than the Python file tests in os.path at times.

like image 674
dawg Avatar asked Sep 24 '11 20:09

dawg


3 Answers

If you're trying to reimplement all of find, then yes, your code is going to get hairy. find is pretty hairy all by itself.

In most cases, though, you're not trying to replicate the complete behavior of find; you're performing a much simpler task (e.g., "find all files that end in .txt"). If you really need all of find, just run find and read the output. As you say, it's the gold standard; you might as well just use it.

I often write code that reads paths on stdin just so I can do this:

find ...a bunch of filters... | my_python_code.py
like image 124
larsks Avatar answered Oct 21 '22 15:10

larsks


There are a couple of observations and several pieces of code to help you on your way.

First, Python can execute code in this form just like Perl:

 cat code.py | python | the rest of the pipe story...

find2perl is a clever code template that emits a Perl function based on a template of find. Therefor, replicate this template and you will not have the "hundreds of permutations" that you are perceiving.

Second, the results from find2perl are not perfect just as there are potentially differences between versions of find, such as GNU or BSD.

Third, by default, os.walk is bottom up; find is top down. This makes for different results if your underlying directory tree is changing while you recurse it.

There are two projects in Python that may help you: twander and dupfinder. Each strives to be os independent and each recurses the file system like find.

If you template a general find like function in Python, set os.walk to recurse top down, use glob to replicate shell expansion, and use some of the code that you find in those two projects, you can replicate find2perl without too much difficulty.

Sorry I could not point to something ready to go for your needs...

like image 2
the wolf Avatar answered Oct 21 '22 16:10

the wolf


I think glob could help in your implementation of this.

like image 1
Benjamin Avatar answered Oct 21 '22 16:10

Benjamin