Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File extension Python

Tags:

python

If a directory contains both '.m' and '.xml' files I want the script to find them both (which it won't do at the moment, instead it goes to the 'else' statement). The given argument should look for all files in a directory.

python script.py --dir C:\\path\\path\\*.* #This should take all files (doesn't matter what type ex 'm', 'xml' 'txt' etc.).

If the user only wants xml files he writes *.xml and vice versa for '.m' files. Do note if the user only wants 'XML' or 'm' files the script will find it

def main(argv):
    args = argumentParser(argv)
    if args.dirname.endswith('.m'):
        overrideM(args)
    elif args.dirname.endswith('.xml'):
        xmlOverride(args)
    elif args.dirname.endswith(('.m', '.xml')): #Can I do like this?
        #Here I want to run both of my function.
        overrideM()
        xmlOverride()
    else:
        print "Error can't find files"

My 'm' function(small part of it)

def overrideM(args):
    for fileName in glob.glob(args.dirname):
        print fileName
        with open(fileName, 'r') as searchFile:
            my_files_content = searchFile.read()
        #...rest of my code

My 'XML' function (small part of it)

def xmlOverride(args):
    for fileName in glob.glob(args.dirname):
        print fileName
        with open(fileName, 'r') as searchFile:
            my_files_content = searchFile.read()
        #...rest of my code
like image 256
gants Avatar asked Nov 08 '22 17:11

gants


1 Answers

elif args.dirname.endswith(('.m', '.xml')) could not possibly work, if args is a string which it has to be or your code would error then it cannot possibly have two different extensions, you would need to get a tuple of extensions if a user wanted to select both, something like:

def main(argv):
    # make argumentParser return a tuple
    args = argumentParser(argv)
    if sorted(args) == ('.m', '.xml'):
        overrideM()
        xmlOverride()

A better option is to use a generic function that takes a file extension and just iterate of the args passing in the extension:

def main(argv):
    args = argumentParser(argv)
    for ext in args:
        generic_search(ext)

There is no way you can use args.dirname.endswith(('.m', '.xml')) on a string if you are trying to match both, the string simply cannot end in both .m and .xml. I would also take the paths as one arg and the extensions to search for as separate args then you can glob each one individually or use str.endswith with multiple extensions using os.listdir to list the files.

The basic idea would be something like:

from argparse import ArgumentParser
import os

parser = ArgumentParser()
parser.add_argument("path")
parser.add_argument('ext', nargs='*')

args = parser.parse_args()
path = args.path
exts = args.ext

# what your glob is doing
for f in os.listdir(path):
    if f.endswith(tuple(exts)):
        with open(os.path.join(path, f)) as fle:
            print(fle.name)
            # do whatever

If you are allowing the user to search for multiple files then unless you are doing something very specific in each function it is better to use endswith and do one pass over the directory.

You can also combine it with glob if you also want to search all the subdirectories as well as path:

from argparse import ArgumentParser
import os
from glob import iglob

parser = ArgumentParser()
parser.add_argument("path")
parser.add_argument('ext', nargs='*')

args = parser.parse_args()
path = args.path
exts = args.ext

for f in chain.from_iterable([iglob(path+"/*"), iglob(path+"/**/*")]):
    if f.endswith(tuple(exts)):
        with open(os.path.join(path, f)) as fle:
            print(fle.name)

Again it will work for multiple file extensions do one pass over the directories. glob is good for single matches or maybe a couple but if you have multiple extensions then it makes a lot more sense to use listdir and filer with endswith.

If you really want to use different logic for each extension you can pull the extension and use a dict to call the appropriate function mapping extension name to the function:

from argparse import ArgumentParser
import os
from glob import iglob

def xml(f):
    print(f)

def m(f):
    print(f)

def text(f):
   print(f)

mapped = {"m":m, "xml":xml, "text":text}

parser = ArgumentParser()
parser.add_argument("path")
parser.add_argument('ext', nargs='*')

args = parser.parse_args()
path = args.path
exts = args.ext


for f in chain.from_iterable([iglob(path + "/*"), iglob(path + "/**/*")]):
    ext = f.rsplit(".", 1)
    if len(ext) == 2 and ext[1] in mapped:
        mapped[ext[1]](f)

The dict lookup is O(1) so apart from being concise it is also very efficient.

Sample output:

 $ python 3_payg.py  /home/padraic  .xml 
/home/padraic/sitemap.xml
/home/padraic/yacy/build.xml
/home/padraic/graphviz-master/graphviz.appdata.xml
like image 128
Padraic Cunningham Avatar answered Nov 14 '22 23:11

Padraic Cunningham