Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly selecting a file from a tree of directories in a completely fair manner

Tags:

python

I'm looking for a way to randomly select a file from a tree of directories in a manner such that any individual file has exactly the same probability of being chosen as all other files. For example in the following tree of files, each file should have a 25% chance of being chosen:

  • /some/parent/dir/
    • Foo.jpg
    • sub_dir/
      • Bar.jpg
      • Baz.jpg
      • another_sub/
        • qux.png

My interim solution which I'm using while I code the rest of the app is to have a function like so:

def random_file(dir):
    file = os.path.join(dir, random.choice(os.listdir(dir)));
    if os.path.isdir(file):
        return random_file(file)
    else:
        return file

However this obviously biases the results depending on where they are in the tree and how many siblings are along side them in their directory so they end up with the following probabilities of being selected:

  • /some/parent/dir/
    • Foo.jpg - 50%
    • sub_dir/ (50%)
      • Bar.jpg - 16.6%
      • Baz.jpg - 16.6%
      • another_sub/ (16.6%)
        • qux.png - 16.6%

The context for the function is in a background rotation app I'm writing, so the ability to filter out unwanted file extensions from being in the results would be a bonus (although I could simply force that by choosing again if it's not the file type I want... that gets messy if there's an abundance of files of the "wrong" type, though).

like image 826
Graham Lyon Avatar asked Jun 20 '11 13:06

Graham Lyon


People also ask

How do you select files randomly?

On first run you select to add it to Windows Explorer and find that option available when you right-click in a folder in the file browser. There you find listed the new select random menu option. Selecting it picks a random file that is stored in the directory.


1 Answers

You can only select all files with the same probability if you know the total number of files in advance, so you need to create a full list first:

files = [os.path.join(path, filename)
         for path, dirs, files in os.walk(dir)
         for filename in files
         if not filename.endswith(".bak")]
return random.choice(files)
like image 186
Sven Marnach Avatar answered Sep 20 '22 07:09

Sven Marnach