Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parse a listing of files to get just the filenames in Python?

So lets say I'm using Python's ftplib to retrieve a list of log files from an FTP server. How would I parse that list of files to get just the file names (the last column) inside a list? See the link above for example output.

like image 964
Lawrence Johnston Avatar asked Oct 26 '08 07:10

Lawrence Johnston


People also ask

How do I get a list of files in Python?

Use os.listdir() function The os. listdir('path') function returns a list containing the names of the files and directories present in the directory given by the path .

How do I get a list of files in a directory and subfolders in Python?

listdir(path='. ') It returns a list of all the files and sub directories in the given path.

How do I get a list of all files in Python?

os.listdir () method gets the list of all files and directories in a specified directory. By default, it is the current directory. Beyond the first level of folders, os.listdir () does not return any files or folders. Return Type: returns a list of all files and directories in the specified path Example 2: To get only .txt files.

How to find a file on a specific path in Python?

Using os. listdir() function. Os has another method which helps us find files on the specific path known as listdir(). It returns all the file names in the directory specified in the location or path as a list format in random order. It excludes the ‘.’ and ‘..’ if they are available in the input folder. Syntax: os.listdir(r’pathname’)

How to get only txt files in a directory in Python?

os.listdir () method gets the list of all files and directories in a specified directory. By default, it is the current directory. Program 2: To get only txt files. OS.walk () generates file names in a directory tree. os.scandir () is supported for Python 3.5 and greater.

How to retrieve files by matching their filenames in Python?

As an alternative, we can retrieve files by matching their filenames by using something called a glob. This way we can only retrieve the files we want. For example, in the code below we only want to list the Python files in our directory, which we do by specifying "*.py" in the glob.


2 Answers

Using retrlines() probably isn't the best idea there, since it just prints to the console and so you'd have to do tricky things to even get at that output. A likely better bet would be to use the nlst() method, which returns exactly what you want: a list of the file names.

like image 86
James Bennett Avatar answered Sep 20 '22 17:09

James Bennett


This best answer

You may want to use ftp.nlst() instead of ftp.retrlines(). It will give you exactly what you want.

If you can't, read the following :

Generators for sysadmin processes

In his now famous review, Generator Tricks For Systems Programmers An Introduction, David M. Beazley gives a lot of receipes to answer to this kind of data problem with wuick and reusable code.

E.G :

# empty list that will receive all the log entry
log = [] 
# we pass a callback function bypass the print_line that would be called by retrlines
# we do that only because we cannot use something better than retrlines
ftp.retrlines('LIST', callback=log.append)
# we use rsplit because it more efficient in our case if we have a big file
files = (line.rsplit(None, 1)[1] for line in log)
# get you file list
files_list = list(files)

Why don't we generate immediately the list ?

Well, it's because doing it this way offer you much flexibility : you can apply any intermediate generator to filter files before turning it into files_list : it's just like pipe, add a line, you add a process without overheat (since it's generators). And if you get rid off retrlines, it still work be it's even better because you don't store the list even one time.

EDIT : well, I read the comment to the other answer and it says that this won't work if there is any space in the name.

Cool, this will illustrate why this method is handy. If you want to change something in the process, you just change a line. Swap :

files = (line.rsplit(None, 1)[1] for line in log)

and

# join split the line, get all the item from the field 8 then join them
files = (' '.join(line.split()[8:]) for line in log)

Ok, this may no be obvious here, but for huge batch process scripts, it's nice :-)

like image 43
e-satis Avatar answered Sep 20 '22 17:09

e-satis