I came across this elegant piece of code, which, however, relies on an obscure but fundamental low-level characteristic:
std::string file_path_leaf( std::string const & path )
{
auto const pos = path.find_last_of("/\\"); // windows or posix
return path.substr( pos + 1 ); // a walk on the wild side?
}
In the edge-case (where 'find_last_of' fails) it works correctly, i.e. it leaves the string alone. But is it too obscure?
According to http://en.cppreference.com/w/cpp/string/basic_string/npos npos
is in fact ...a special value equal to the maximum value representable by the type size_type
. Cross-checked with the C++11 standard you can find this spelled out in 21.4/5, the class definition.
Then also unsigned arithmetic is equally guaranteed by the standard to wrap around, so the code is perfectly well-defined.
That said you can still make clarity arguments for changing it. At the very least add a comment stating how it works for future maintainers.
TL;DR: the code is almist certainly safe, but you could make it bulletproof by changing pos + 1
to pos + 1U
. That will avoid an obscure corner of the integer promotion rules which could only be triggered by a perverse (but legal) implementation.
Thanks to @T.C. for a useful discussion which helped me remove a number of irrelevant sidetracks from this answer.
basic_string::find_last_of
has return type size_type
([string.find.last.of] 24.3.2.7.5/p1) which is an unsigned integer type ([allocator.requirements] 20.5.3.5/Table 31). (In fact, this is almost certainly std::size_t
since std::string
uses the default allocator, whose size_type
type member is std::size_t
, according to [depr.default.allocator] D.10/p1.)
The value returned by basic_string::find_last_of
in the case that the searched object is not found is basic_string::npos
([string.find.last.of] 24.3.2.7.5/p2), which is size_type(-1)
([basic.string] 24.3.2/p5.1).
It is highly likely (but not guaranteed) that size_type
will have an integer rank greater than or equal to int
. If that is the case, then in the computation pos + 1
, the 1 will be converted to size_type
prior to doing the addition, and the addition will then be between unsigned integer values. The rules for unsigned integer arithmetic guarantee that the result of adding an unsigned 1 to an unsigned -1 must be 0.
If size_type
's rank is less than that of int
, then the result will depend on the relationship between numeric_limits::max<size_type>()
(which is precisely the value of npos
) and numeric_limits::max<int>()
:
If numeric_limits::max<size_type>() > numeric_limits::max<int>()
(which implies that size_type
and unsigned int
are the same size, even though they are different types), then pos
will be "promoted" to an unsigned int
and then the 1
will also be converted to unsigned int
. The result will be the same as above.
If numeric_limits::max<size_type>() < numeric_limits::max<int>()
, then pos
will be promoted to a (signed) int
, and the addition will be signed. But given the inequality, pos + 1
cannot overflow. The result could be too large to fit into a size_type
, of course, but the conversion from signed to unsigned integer types is well-defined and if pos == npos
, the result is guaranteed to be 0.
That leaves only the case where numeric_limits::max<size_type>() == numeric_limits::max<int>()
. One theoretical possibility for this equality would be that size_type
is precisely one bit shorter than an int
. Such an implementation would be weird, to say the least, but it is still legal. In this case, pos
will be promoted to (signed) int
, 1
will remain a signed int
, and if pos == npos
, the signed addition pos + 1
will
overflow, creating Undefined Behaviour.
To summarize, the computation is unsafe if size_type
has a smaller rank than int
but has the same maximum value. I have a hard time believing this situation would ever occur in real life but I believe that it would not violate the C++ standard.
To protect yourself from the above extremely unlikely case, it is sufficient to change the expression to pos + 1U
. That will guarantee that pos
is converted (if necessary) to an unsigned int
to match 1U
, after which the addition will be performed as unsigned arithmetic with no possibility of UB, and the result when pos == npos
(after truncation to the size of size_type
) will be 0.
JoeMo:
I thank you for your thoughtful and well-researched responses to my question.
I actually did change it (along the lines suggested by milleniumbug:(if(pos == std::string::npos) return path; etc.). Also, I agree with him or her that the code looks wrong.
The reason why the code looks wrong is also why it seemed 'elegant'; one pass through the code handles all the different cases. There are no if statements etc., just two straight statements. Can't be that simple, can it?
But it accomplishes this feat of simplification by making use of specialized knowledge at different levels of abstraction (bit-wize knowledge of the string::npos value, the way that substr works in edge-cases, etc.).
Better to keep things at the same level of abstraction, even at the cost of a little extra code.
Thanks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With