Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does boost::filesystem::canonical() require the target path to exist?

The documentation for boost::filesystem::canonical(const path& p) states:

Overview: Converts p, which must exist, to an absolute path that has no symbolic link, dot, or dot-dot elements.
...
Remarks: !exists(p) is an error.

The consequence of this is that if p identifies a symbolic link whose target does not exist, the function fails with file not found and does not return a path.

This seems overly restrictive to me: just because the target of the link doesn't exist, I see no reason why the function can't resolve the path of that non-existent target. (In comparison, absolute() imposes no such restriction.)

(Clearly, if a symbolic link within the path is broken, the target path can't be resolved.)

So, is there a legitimate justification for this restriction?

And even if there is, is there not also justification for the creation of a variant of the function that does not have this restriction? (Without such a variant, obtaining the path requires error-prone manual replication of 99% of what canonical() already does.)

I appreciate that the semantic subtleties that exist between stat() and lstat() apply equally to this case - which is precisely why I think a variant of the function is equally justified.

NB: This question is equally applicable to the std::experimental::filesystem library (n4100), which is based on boost::filesystem.

EDIT:

After @Jonathan Wakeley's very knowledgeable answer below, I'm still left with the essence of my original questions, which I'll reframe slightly:

  • Is there an underlying technical or logical reason why boost::filesystem::canonical() requires the target to exist? By that I mean, does the non-existence of the target somehow make it impossible to resolve the path to canonical form?

  • If not, is there any technical or logical reason not to propose a variation of the function that differs only from the existing form in that it does not require the target to exist?

  • In the transformation (as I understand to be the case) of boost::filesystem into the proposed N4100 std::experimental::filesystem, has this restriction on canonical() been adopted after due consideration, or is it just 'falling through' from the Boost definition?

EDIT 2:

I notice that Boost 1.60 now provides the function weakly_canonical(): "Returns p with symlinks resolved and the result normalized. Returns: A path composed of the result of calling the canonical() function on a path composed of the leading elements of p that exist, if any, followed by the elements of p that do not exist, if any."

EDIT 3:

More discussion of this in relation to std::filesystem.

like image 756
Jeremy Avatar asked Jul 10 '15 09:07

Jeremy


2 Answers

try weakly_canonical() it does not require path to exist on mac

like image 58
apiashko Avatar answered Oct 21 '22 05:10

apiashko


Basically because it's a wrapper for realpath which has the same requirement.

You could ask the same question of realpath, but I think the answer is that if you're trying to find out the real, physical file or directory that a pathname refers to, then if it is a broken symlink then there is no answer, it doesn't refer to a real file or directory, so you want an error.

The OP's comment below questions my claim that filesystem::canonical and realpath implement the same operation, but the definitions in N4100 and POSIX seem almost identical to me, compare:

The realpath() function shall derive, from the pathname pointed to by file_name, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.', '..', or symbolic links.

and:

Converts p, which must exist, to an absolute path that has no symbolic link, ".", or ".." elements.

In both cases the requirements are:

  • no symbolic links, if it returned a path where the last component is a symbolic link that requirement would not be met.

  • the canonical path refers to something that exists, this is explicit in N4100, and implicit in POSIX in that it points to some directory entry (i.e. something that exists) and the directory entry is not a symbolic link (because of the first requirement).

As to why those should be the requirements, the note in N4100 is helpful:

[Note: Canonical pathnames allow security checking of a path (e.g. does this path live in /home/goodguy or /home/badguy?) —end note]

As I already said above, if it returns successfully even when the path is a symlink that doesn't actually point to anything, then you need to do extra work to check if it resolves to a real file or not, making the intended use case less convenient.

And even if there is, is there not also justification for the creation of a variant of the function that does not have this restriction? (Without such a variant, obtaining the path requires error-prone manual replication of 99% of what canonical() already does.)

Arguably that variant would be less commonly useful, and so should not be the default, but if you need it then it's not difficult to do:

// like canonical() but allows the last component of p to be a broken symlink
filesystem::path
resolve_most_symlinks(filesystem::path const& p, filesystem::path const& base = filesystem::current_path())
{
  if (is_symlink(p) && !exists(p))
    return canonical(absolute(p, base).remove_filename()) / p.filename();
  return canonical(p);
}
like image 41
Jonathan Wakely Avatar answered Oct 21 '22 04:10

Jonathan Wakely