Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

libstdc++ deprecation message for u8path suggests a strict aliasing violation as a workaround?

Tags:

c++

gcc

libstdc++

C++20 deprecates std::filesystem::u8path:

run on gcc.godbolt.org

#include <filesystem>

std::string foo();

int main()
{
    auto path = std::filesystem::u8path(foo());
}

libstdc++ 13 has a deprecation warning in place:

<source>:7:40: warning: 'std::filesystem::__cxx11::path std::filesystem::__cxx11::u8path(
const _Source&) [with _Source = std::__cxx11::basic_string<char>; _Require = path; _CharT
 = char]' is deprecated: use 'path((const char8_t*)&*source)' instead [-Wdeprecated-decla
rations]
    7 |     auto path = std::filesystem::u8path(foo());
      |                 ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~

The proposed cast path((const char8_t*)&*source) looks like an outright strict aliasing violation to me, and hence UB.

Is that correct? Is GCC making any additional guarantees that make this legal?

And lastly, is there a better workaround if my path is stored in std::string and I don't want to rewrite everything to std::u8string?

like image 792
HolyBlackCat Avatar asked Oct 27 '25 06:10

HolyBlackCat


1 Answers

In short, there is undefined behavior in your example. However, the actual cause is not a strict aliasing violation, but a precondition violation because of a hypothetical strict aliasing violation.

No undefined behavior due to strict aliasing

There is no strict aliasing violation ([basic.lval] p11) because any access of the characters would happen within the constructor of std::filesystem::path or other parts of the filesystem library, and those could be permitted to type-pun in ways that the user can't.

(const char8_t*)&* is essentially a reinterpret_cast<const char8_t*> of your data. reinterpret_cast on its own is valid, even if accessing objects through the pointer wouldn't be. With the resulting pointer, you would call the following constructor:

template<class Source>
path(const Source& source, format fmt = auto_format);

Effects: Let s be the effective range of source or the range [first, last), with the encoding converted if required. Finds the detected-format of s and constructs an object of class path for which the pathname in that format is s.

- [fs.class.path] std::path constructor 3

The format detection, argument format conversions, and type and encoding conversions for the path are all defined mathematically or through prose. For example, the encoding conversion is defined in [fs.path.type.cvt] p3:

For member function arguments that take character sequences representing paths and for member functions returning strings, value type and encoding conversion is performed if the value type of the argument or return value differs from path​::​value_type. For the argument or return value, the method of conversion and the encoding to be converted to is determined by its value type:

  • [...]
  • char8_t: The encoding is UTF-8. The method of conversion is unspecified.

The implementation has a lot of freedom when it comes to implementing this. The std::filesystem::path constructor could have relaxed aliasing rules for instance.

Undefined behavior due to precondition violation

The issue lies in the use of value type:

An input iterator i supports the expression *i, resulting in a value of some object type T, called the value type of the iterator.

Your iterator would be of type const char8_t*, and indirection (*i) would not be valid for it because it would hypothetically violate strict aliasing. Therefore, what you're passing to the path constructor has no value type, and the behavior is undefined because of a precondition violation.

GCC strict aliasing relaxations between character types

I was unable to find details about this in the GCC documentation, but char8_t appears to be able to alias char:

auto alias(char c) {
    return *reinterpret_cast<char8_t*>(&c); // OK, no -Wstrict-aliasing
}

See Compiler Explorer.

Presumably, you are thus relying on compiler extensions.

like image 186
Jan Schultke Avatar answered Oct 28 '25 20:10

Jan Schultke



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!