Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize a path in Perl? (without checking the filesystem)

I want the Perl's equivalent of Python's os.path.normpath():

Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. […]

For instance, I want to convert '/a/../b/./c//d' into /b/c/d.

The path I'm manipulating does NOT represent a real directory in the local file tree. There are no symlinks involved. So a plain string manipulation works fine.

I tried Cwd::abs_path and File::Spec, but they don't do what I want.

my $path = '/a/../b/./c//d';

File::Spec->canonpath($path);
File::Spec->rel2abs($path, '/');
# Both return '/a/../b/c/d'.
# They don't remove '..' because it might change
# the meaning of the path in case of symlinks.

Cwd::abs_path($path);
# Returns undef.
# This checks for the path in the filesystem, which I don't want.

Cwd::fast_abs_path($path);
# Gives an error: No such file or directory

Possibly related link:

  • Normalized directory paths - perlmonks: people discuss several approaches.
like image 770
Denilson Sá Maia Avatar asked Aug 11 '17 09:08

Denilson Sá Maia


3 Answers

Given that File::Spec is almost what I needed, I ended up writing a function that removes ../ from File::Spec->canonpath(). The full code including tests is available as a GitHub Gist.

use File::Spec;

sub path_normalize_by_string_manipulation {
    my $path = shift;

    # canonpath does string manipulation, but does not remove "..".
    my $ret = File::Spec->canonpath($path);

    # Let's remove ".." by using a regex.
    while ($ret =~ s{
        (^|/)              # Either the beginning of the string, or a slash, save as $1
        (                  # Followed by one of these:
            [^/]|          #  * Any one character (except slash, obviously)
            [^./][^/]|     #  * Two characters where
            [^/][^./]|     #    they are not ".."
            [^/][^/][^/]+  #  * Three or more characters
        )                  # Followed by:
        /\.\./             # "/", followed by "../"
        }{$1}x
    ) {
        # Repeat this substitution until not possible anymore.
    }

    # Re-adding the trailing slash, if needed.
    if ($path =~ m!/$! && $ret !~ m!/$!) {
        $ret .= '/';
    }

    return $ret;
}
like image 116
Denilson Sá Maia Avatar answered Nov 14 '22 22:11

Denilson Sá Maia


My use case was normalizing include paths inside files relative to another path. For example, I might have a file at '/home/me/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/concept.rng' that includes the following file relative to itself:

<include href="../../base/rng/topicMod.rng"/>

and I needed the absolute path of that included file. (The including file path might be absolute or relative.)

Path::Tiny was promising, but I can only use core modules.

I tried using chdir to the include file location then using File::Spec->rel2abs() to resolve the path, but that was painfully slow on my system.

I ended up writing a subroutine to implement a simple string-based method of evaporating '../' components:

#!/usr/bin/perl
use strict;
use warnings;

use Cwd;
use File::Basename;
use File::Spec;

sub adjust_local_path {
 my ($file, $relative_to) = @_;
 return Cwd::realpath($file) if (($relative_to eq '.') || ($file =~ m!^\/!));  # handle the fast cases

 $relative_to = dirname($relative_to) if (-f $relative_to);
 $relative_to = Cwd::realpath($relative_to);
 while ($file =~ s!^\.\./!!) { $relative_to =~ s!/[^/]+$!!; }
 return File::Spec->catdir($relative_to, $file);
}

my $included_file = '/home/chrispy/dita-ot/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/rng/topic.rng';
my $source_file = '.././base/rng/topicMod.rng';
print adjust_local_path($included_file, $source_file)."\n";

The result of the script above is

$ ./test.pl
/home/me/dita-ot-3.1.3/plugins/org.oasis-open.dita.v1_3/rng/technicalContent/base/rng/topicMod.rng

Using realpath() had the nice side-effect of resolving symlinks, which I needed. In the example above, dita-ot/ is a link to dita-ot-3.1.3/.

You can provide either a file or a path as the second argument; if it's a file, the directory path of that file is used. (This was convenient for my own purposes.)

like image 37
chrispitude Avatar answered Nov 15 '22 00:11

chrispitude


Fixing Tom van der Woerdt code:

foreach my $path ("/a/b/c/d/../../../e" , "/a/../b/./c//d") {
    my @c= reverse split m@/@, $path;
    my @c_new;
    while (@c) {
        my $component= shift @c;
        next unless length($component);
        if ($component eq ".") { next; }
        if ($component eq "..") { 
            my $i=0;
            while ($c[$i] =~ m/^\.{0,2}$/) {
                $i++
            }
            splice(@c, $i, 1);
            next 
        }
        push @c_new, $component;
    }
    print "/".join("/", reverse @c_new) ."\n";
}
like image 23
Georg Mavridis Avatar answered Nov 14 '22 23:11

Georg Mavridis