Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash shell script to find the closest parent directory of several files

Tags:

bash

Suppose the input arguments are the FULL paths of several files. Say,

/abc/def/file1
/abc/def/ghi/file2
/abc/def/ghi/file3
  1. How can I obtain the directory name /abc/def in a bash shell script?
  2. How can I obtain only file1, /ghi/file2, and /ghi/file3?
like image 549
Chang Avatar asked Sep 09 '12 16:09

Chang


2 Answers

Given the answer for part 1 (the common prefix), the answer for part 2 is straight-forward; you slice the prefix off each name, which could be a done with sed amongst other options.

The interesting part, then, is finding the common prefix. The minimum common prefix is / (for /etc/passwd and /bin/sh, for example). The maximum common prefix is (by definition) present in all the strings, so we simply need to split one of the strings into segments, and compare possible prefixes against the other strings. In outline:

split name A into components
known_prefix="/"
for each extra component from A
do
    possible_prefix="$known_prefix/$extra/"
    for each name
    do
        if $possible_prefix is not a prefix of $name
        then ...all done...break outer loop...
        fi
    done
    ...got here...possible prefix is a prefix!
    known_prefix=$possible_prefix
done

There are some administrivial details to deal with, such as spaces in names. Also, what is the permitted weaponry. The question is tagged bash but which external commands are allowed (Perl, for example)?

One undefined issue — suppose the list of names was:

/abc/def/ghi
/abc/def/ghi/jkl
/abc/def/ghi/mno

Is the longest common prefix /abc/def or /abc/def/ghi? I'm going to assume that the longest common prefix here is /abc/def. (If you really wanted it to be /abc/def/ghi, then use /abc/def/ghi/. for the first of the names.)

Also, there are invocation details:

  • How is this function or command invoked?
  • How are the values returned?
  • Is this one or two functions or commands (longest_common_prefix and 'path_without_prefix`)?

Two commands are easier:

  • prefix=$(longest_common_prefix name1 [name2 ...])
  • suffix=$(path_without_prefix /pre/fix /pre/fix/to/file [...])

The path_without_prefix command removes the prefix if it is present, leaving the argument unchanged if the prefix does not start the name.

longest_common_prefix

longest_common_prefix()
{
    declare -a names
    declare -a parts
    declare i=0

    names=("$@")
    name="$1"
    while x=$(dirname "$name"); [ "$x" != "/" ]
    do
        parts[$i]="$x"
        i=$(($i + 1))
        name="$x"
    done

    for prefix in "${parts[@]}" /
    do
        for name in "${names[@]}"
        do
            if [ "${name#$prefix/}" = "${name}" ]
            then continue 2
            fi
        done
        echo "$prefix"
        break
    done
}

Test:

set -- "/abc/def/file 0" /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 "/abc/def/ghi/file 4"
echo "Test: $@"
longest_common_prefix "$@"
echo "Test: $@" abc/def
longest_common_prefix "$@" abc/def
set --  /abc/def/ghi/jkl /abc/def/ghi /abc/def/ghi/mno
echo "Test: $@"
longest_common_prefix "$@"
set -- /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
echo "Test: $@"
longest_common_prefix "$@"
set -- "/a c/d f/file1" "/a c/d f/ghi/file2" "/a c/d f/ghi/file3"
echo "Test: $@"
longest_common_prefix "$@"

Output:

Test: /abc/def/file 0 /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 /abc/def/ghi/file 4
/abc/def
Test: /abc/def/file 0 /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 /abc/def/ghi/file 4 abc/def
Test: /abc/def/ghi/jkl /abc/def/ghi /abc/def/ghi/mno
/abc/def
Test: /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
/abc/def
Test: /a c/d f/file1 /a c/d f/ghi/file2 /a c/d f/ghi/file3
/a c/d f

path_without_prefix

path_without_prefix()
{
    local prefix="$1/"
    shift
    local arg
    for arg in "$@"
    do
        echo "${arg#$prefix}"
    done
}

Test:

for name in /pre/fix/abc /pre/fix/def/ghi /usr/bin/sh
do
    path_without_prefix /pre/fix $name
done

Output:

abc
def/ghi
/usr/bin/sh
like image 111
Jonathan Leffler Avatar answered Oct 14 '22 10:10

Jonathan Leffler


A more "portable" solution, in the sense that it doesn't use bash-specific features: First define a function to compute the longest common prefix of two paths:

function common_path()
{
  lhs=$1
  rhs=$2
  path=
  OLD_IFS=$IFS; IFS=/
  for w in $rhs; do
    test "$path" = / && try="/$w" || try="$path/$w"
    case $lhs in
      $try*) ;;
      *) break ;;
    esac
    path=$try
  done
  IFS=$OLD_IFS
  echo $path
}

Then use it for a long list of words:

function common_path_all()
{
  local sofar=$1
  shift
  for arg
  do
    sofar=$(common_path "$sofar" "$arg")
  done
  echo ${sofar:-/}
}

With your input, it gives

$ common_path_all /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
/abc/def

As Jonathan Leffler pointed out, once you have that, the second question is trivial.

like image 45
Idelic Avatar answered Oct 14 '22 11:10

Idelic