Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Native path separator bug in C++17 std::filesystem::path?

I encountered a problem when upgrading from #include <experimental/filesystem> to #include <filesystem>. It seems that the std::filesystem::path::wstring method is not returning the same string as in experimental::filesystem. I wrote the following small test program with output result included.

#include <iostream>
#include <filesystem>
#include <experimental/filesystem>

namespace fs = std::filesystem;
namespace ex = std::experimental::filesystem;
using namespace std;

int main()
{
    fs::path p1{ L"C:\\temp/foo" };    
    wcout << "std::filesystem Native: " << p1.wstring() << "  Generic: " << p1.generic_wstring() << endl;

    ex::path p2{ L"C:\\temp/foo" };
    wcout << "std::experimental::filesystem Native: " << p2.wstring() << "  Generic: " << p2.generic_wstring() << endl;
}

/* Output:
std::filesystem Native: C:\temp/foo  Generic: C:/temp/foo
std::experimental::filesystem Native: C:\temp\foo  Generic: C:/temp/foo
*/

According to https://en.cppreference.com/w/cpp/filesystem/path/string:

Return value

The internal pathname in native pathname format, converted to specified string type.

The program ran on Windows 10 and was compiled with Visual Studio 2017 version 15.8.0. I would expect the native pathname to be C:\temp\foo.

Question: Is this a bug in std::filesystem::path?

like image 968
Garland Avatar asked Aug 16 '18 22:08

Garland


2 Answers

Roughly, a bug in a compiler happens when it exhibits behavior that is forbidden by the standard (either explicitly or implicitly), or behavior that diverges from the documentation of said compiler.

The standard imposes no restrictions on the format of native path strings, except that the format should be accepted by the underlying operating system (quote below). How could it impose such restrictions? The language has no say in how paths are handled by the host OS, and to do it confidently it would have to know every single target it may be compiled to, which is clearly not feasible.

[fs.class.path]

5   A pathname is a character string that represents the name of a path. Pathnames are formatted according to the generic pathname format grammar ([fs.path.generic]) or according to an operating system dependent native pathname format accepted by the host operating system.

(Emphasis mine)

The documentation of MSVC implies that the forward slash is perfectly acceptable as a separator:

Common to both systems is the structure imposed on a pathname once you get past the root name. For the pathname c:/abc/xyz/def.ext:

  • The root name is c:.
  • The root directory is /.
  • The root path is c:/.
  • The relative path is abc/xyz/def.ext.
  • The parent path is c:/abc/xyz.
  • The filename is def.ext.
  • The stem is def.
  • The extension is .ext.

It does mention a preferred separator, but this really only implies the behavior of std::make_preferred, and not of the default path output:

A minor difference is the preferred separator, between the sequence of directories in a pathname. Both operating systems let you write a forward slash /, but in some contexts Windows prefers a backslash \.

The question of whether this is a bug, then, is easy: Since the standard imposes no restrictions on the behavior, and the compiler's documentation implies no mandatory need for a backward slash, there can be no bug.

Left is the question of whether this is a quality of implementation issue. After all, compiler and library implementers are expected to know all quirks about their target, and implement features accordingly.

It's up for debate which slash ('\' or '/') you should use in Windows, or whether it really matters at all, so there can be no authoritative answer. Any answer that advocates for one or the other must be very careful to not be too much opinion-based. Also, the mere existence of path::make_preferred indicates that the native path is not necessarily the preferred one. Consider the zero-overhead principle: Making the path always be the preferred one would incur an overhead on the people who don't need to be that pedantic when handling paths.

Finally, the std::experimental namespace is what it says on the box: You shouldn't expect the final standardized library to behave the same as its experimental version, or even expect that a final standardized library will exist at all. It's just the way it is, when dealing with experimental stuff.

like image 144
Cássio Renan Avatar answered Oct 21 '22 17:10

Cássio Renan


No, it is not a bug!

string()et al and c_str()/native() return the internal pathname in native pathname format.

What native does mean

MS states, it uses ISO/IEC TS 18822:2015. The final draft defines the native pathname format in §4.11 as follows:

The operating system dependent pathname format accepted by the host operating system.

In Windows, native() returns the path as std::wstring().

How to force the usage of backslashes as directory separator in Windows

The standard defines the term preferred-separator (see also §8.1 (pathname format grammar)):

An operating system dependent directory separator character.

A path can be converted (in place) to the preferred-separator with path::make_preferred. In Windows, it has the noexcept operator.

Why you shouldn't worry

The MS documentation about paths states about the usage of / vs \

File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\?\" prefix as detailed in the following sections.

and in the documentation about C++ file navigation, the slash (known as fallback-separator in newer drafts) is even used directly after the root-name:

path pathToDisplay(L"C:/FileSystemTest/SubDir3/SubDirLevel2/File2.txt ");

Example for VS2017 15.8 with -std:C++17:

#include <filesystem>
#include <iostream>
namespace fs = std::filesystem;

void output(const std::string& type, fs::path& p)
{
    std::cout
        << type << ":\n"
        << "- native: " << p.string() << "\n"
        << "- generic: " << p.generic_string() << "\n"
        << "- preferred-separator" << p.make_preferred() << "\n";
}

int main()
{
    fs::path local_win_path("c:/dir/file.ext");
    fs::path unc_path("//your-remote/dir/file.ext");

    output("local absolute win path", local_win_path);
    output("unc path", unc_path);

    unc_path = "//your-remote/dir/file.ext"; // Overwrite make_preferred applied above.
    if (fs::is_regular_file(unc_path))
    {
        std::cout << "UNC path containing // was understood by Windows std filesystem";
    }
}

Possible output (when unc_path is an existing file on an existing remote):

local absolute win path:
- native: c:/dir/file.ext
- generic: c:/dir/file.ext
- preferred-separator"c:\\dir\\file.ext"
unc path:
- native: //your-remote/dir/file.ext
- generic: //your-remote/dir/file.ext
- preferred-separator"\\\\your-remote\\dir\\file.ext"
UNC path containing // was understood by Windows std filesystem

So explicit path transformations to the preferred-separator should only be necessary when working with libraries that enforce the usage of that separator for their file system interaction.

like image 31
Roi Danton Avatar answered Oct 21 '22 16:10

Roi Danton