Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash to Python: flatten directory tree

Tags:

python

bash

On Unix-like systems I use this script, which I'd like some help on porting to Python for execution on Windows hosts:


#!/bin/bash

SENTINEL_FILENAME='__sentinel__'
SENTINEL_MD5_CHECKSUM=''
SENTINEL_SHA_CHECKSUM=''

function is_directory_to_be_flattened() {

  local -r directory_to_consider="$1"
  local -r sentinel_filepath="${directory_to_consider}/${SENTINEL_FILENAME}"

  if [ ! -f "${sentinel_filepath}" ]; then
    return 1
  fi

  if [[
      "$(
         md5 "${sentinel_filepath}" \
           | awk '{ print $NF }' 2> /dev/null
       )" \
        == "${SENTINEL_MD5_CHECKSUM}"
    && \
      "$(
         shasum -a 512 "${sentinel_filepath}" \
           | awk '{ print $1 }' 2> /dev/null
       )" \
        == "${SENTINEL_SHA_CHECKSUM}"
  ]]; then
    return 0
  else
    return 1
  fi
}

function conditionally_flatten() {

  local -r directory_to_flatten="$1"
  local -r flatten_into_directory="$2"

  if is_directory_to_be_flattened "${directory_to_flatten}"; then

    if [ ! -d "${flatten_into_directory}" ]; then
      mkdir -v "${flatten_into_directory}"
    fi

    for file_to_move in $(find ${directory_to_flatten} -type f -maxdepth 1); do
      mv \
        -v \
        -n \
        "${file_to_move}" \
        "${flatten_into_directory}"
    done
  fi
}

function flatten_directory() {

  local -r directory_to_flatten="$1"
  local -r descend_depth="$2"

  local -r flattened_directory="${directory_to_flatten}/__flattened__"

  if [ ! -d "${directory_to_flatten}" ]; then
    printf "The argument '%s' does not seem to be a directory.\n" \
      "${directory_to_flatten}" \
      >&2
    return
  fi

  find "${directory_to_flatten}" \
    -type d \
    -maxdepth "${descend_depth}" \
  | \
    while read directory_path; do
      conditionally_flatten \
        "${directory_path}" \
        "${flattened_directory}"
    done
}

n_arguments="$#"

if [ "${n_arguments}" -eq 1 ]; then
  flatten_directory "$1" '1' # maybe use a constant, not a "magic #" here?
else
  echo usage: "$0" /path/to/directory/to/flatten
fi

unset is_directory_to_be_flattened
unset conditionally_flatten
unset flatten_directory

How would you port this to Win Python? I am a beginner in both Python and Bash scripting..

Feel free to upgrade my implementation as you port it if you find it lacking in any way too, with a justification please. This is not "Code Review" but a "thumbs up/thumbs down" on my effort in Bash would give me a sense of whether I am improving or I should change the way I study altogether...


Here we go, my attempt in Python: (criticise it hard if need be, it's the only way for me to learn!)


#!/usr/bin/env python2.7

import sys
import os
import shutil

SENTINEL_FILENAME=''
SENTINEL_MD5_CHECKSUM=''
SENTINEL_SHA_CHECKSUM=''

DEFAULT_DEPTH = 1
FLATTED_DIRECTORY_NAME = '__flattened__'

def is_directory_to_be_flattened(directory_to_consider):
  sentinel_location = os.path.join(directory_to_consider, SENTINEL_FILENAME)
  if not os.path.isfile(sentinel_location):
    return False
  import hashlib
  with open(sentinel_location) as sentinel_file:
    file_contents = sentinel_file.read()
    return (hashlib.md5(file_contents).hexdigest() == SENTINEL_MD5_CHECKSUM
      and hashlib.sha512(file_contents).hexdigest() == SENTINEL_SHA_CHECKSUM)

def flatten(directory, depth, to_directory, do_files_here):
  if depth < 0:
    return
  contained_filenames = [f for f in os.listdir(directory)]
  if do_files_here:
    for filename in contained_filenames:
      if filename == SENTINEL_FILENAME:
        continue
      filepath = os.path.join(directory, filename)
      if not os.path.isfile(filepath):
        continue
      file_to = os.path.join(to_directory, filename)
      if not os.path.isdir(to_directory):
        os.makedirs(to_directory)
      if not os.path.isfile(file_to):
        print "Moving: '{}' -> '{}'".format(filepath, file_to)
        shutil.move(filepath, file_to)
      else:
    sys.stderr.write('Error: {} exists already.\n'.format(file_to))
  next_depth = depth - 1
  for subdirectory in (d for d in contained_filenames if os.path.isdir(d)):
    if is_directory_to_be_flattened(subdirectory):
      flatten(subdirectory, next_depth, to_directory, True)

def flatten_directory(to_flatten, depth):
  to_directory = os.path.join(to_flatten, FLATTED_DIRECTORY_NAME)
  if not os.path.isdir(to_flatten):
    sys.stderr.write(
      'The argument {} does not seem to be a directory.\n'.format(
      to_flatten))
    return
  flatten(to_flatten, depth, to_directory, False)

def main():
  if len(sys.argv) == 2:
    flatten_directory(sys.argv[1], DEFAULT_DEPTH)
  else:
    print 'usage: {} /path/to/directory/to/flatten'.format(sys.argv[0])

if __name__ == '__main__':
  main()

Although it's obvious from the code, the intent is:

  • Start at a given directory
  • Descend up to a certain depth
  • Consider subdirectories and move all files therein if and only if:
    • The directory contains a "sentinel file" with a given filename
    • The sentinel file is actually a sentinel file, not just a file renamed to the same name
  • Collate files in a __flattened__ directory under the directory in which the search started
like image 953
Robottinosino Avatar asked Jan 16 '23 09:01

Robottinosino


1 Answers

Most file dealing functions in Python are in the module "os" - therein you will find os.rename (for renaming or moving a directoruy entry), os.listdir - which gives you a listing of filenames in the directory, passed as first arg, os.walk - to recursively walk through a directory structure, os.path.walk, to do the same, but with a callback, os.path.exists, os.path.isdir, os.mkdir, are others that might be handy.

For a "quick and dirty" translation you might also cehck "os.system". which allows you to execute a shell command just like it was typed in the shell, and os.popen - which allows access to stdin and stdout of said process. A more carefull translation, tough, would require using anothe module: "subprocess" which can give one full control of a shell command executed as sub process (although if you need find, for example, it won't be available on windows)

Other moduless of interest are sys (sys.argv are the arguments passed to the script) and shutil (with things like copy, rmtree and such)

Your script does some error checking, and it is trivial, given the above funcion names in "os" and basic Python to add them - but a short "just do it" script in Python could be just:

import os, sys

dir_to_flatten = sys.argv[1]

for dirpath, dirnames, filenames in os.walk(dir_to_flatten):
    for filename in filenames:
        try:
            os.rename(os.path.join(dirpath, filename), os.path.join(dir_to_flatten, filename))
        except OSError:
            print ("Could not move %s " % os.path.join(dirpath, filename))
like image 100
jsbueno Avatar answered Jan 23 '23 10:01

jsbueno