C++20 deprecates std::filesystem::u8path:
run on gcc.godbolt.org
#include <filesystem>
std::string foo();
int main()
{
auto path = std::filesystem::u8path(foo());
}
libstdc++ 13 has a deprecation warning in place:
<source>:7:40: warning: 'std::filesystem::__cxx11::path std::filesystem::__cxx11::u8path(
const _Source&) [with _Source = std::__cxx11::basic_string<char>; _Require = path; _CharT
= char]' is deprecated: use 'path((const char8_t*)&*source)' instead [-Wdeprecated-decla
rations]
7 | auto path = std::filesystem::u8path(foo());
| ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
The proposed cast path((const char8_t*)&*source) looks like an outright strict aliasing violation to me, and hence UB.
Is that correct? Is GCC making any additional guarantees that make this legal?
And lastly, is there a better workaround if my path is stored in std::string and I don't want to rewrite everything to std::u8string?
In short, there is undefined behavior in your example. However, the actual cause is not a strict aliasing violation, but a precondition violation because of a hypothetical strict aliasing violation.
There is no strict aliasing violation ([basic.lval] p11) because any access of the characters would happen within the constructor of std::filesystem::path or other parts of the filesystem library, and those could be permitted to type-pun in ways that the user can't.
(const char8_t*)&* is essentially a reinterpret_cast<const char8_t*> of your data.
reinterpret_cast on its own is valid, even if accessing objects through the pointer wouldn't be. With the resulting pointer, you would call the following constructor:
template<class Source> path(const Source& source, format fmt = auto_format);Effects: Let
sbe the effective range of source or the range[first, last), with the encoding converted if required. Finds the detected-format ofsand constructs an object of class path for which the pathname in that format iss.
- [fs.class.path] std::path constructor 3
The format detection, argument format conversions, and type and encoding conversions for the path are all defined mathematically or through prose. For example, the encoding conversion is defined in [fs.path.type.cvt] p3:
For member function arguments that take character sequences representing paths and for member functions returning strings, value type and encoding conversion is performed if the value type of the argument or return value differs from
path::value_type. For the argument or return value, the method of conversion and the encoding to be converted to is determined by its value type:
- [...]
char8_t: The encoding is UTF-8. The method of conversion is unspecified.
The implementation has a lot of freedom when it comes to implementing this. The std::filesystem::path constructor could have relaxed aliasing rules for instance.
The issue lies in the use of value type:
An input iterator
isupports the expression*i, resulting in a value of some object typeT, called the value type of the iterator.
Your iterator would be of type const char8_t*, and indirection (*i) would not be valid for it because it would hypothetically violate strict aliasing.
Therefore, what you're passing to the path constructor has no value type, and the behavior is undefined because of a precondition violation.
I was unable to find details about this in the GCC documentation, but char8_t appears to be able to alias char:
auto alias(char c) {
return *reinterpret_cast<char8_t*>(&c); // OK, no -Wstrict-aliasing
}
See Compiler Explorer.
Presumably, you are thus relying on compiler extensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With