Should I fix this obscure but elegant c++ code fragment? [closed]

Question

I came across this elegant piece of code, which, however, relies on an obscure but fundamental low-level characteristic:

std::string file_path_leaf( std::string const & path )
{
    auto const pos = path.find_last_of("/\"); // windows or posix
    return path.substr( pos + 1 ); // a walk on the wild side?
}

In the edge-case (where 'find_last_of' fails) it works correctly, i.e. it leaves the string alone. But is it too obscure?

Mark B · Accepted Answer

According to http://en.cppreference.com/w/cpp/string/basic_string/npos npos is in fact ...a special value equal to the maximum value representable by the type size_type. Cross-checked with the C++11 standard you can find this spelled out in 21.4/5, the class definition.

Then also unsigned arithmetic is equally guaranteed by the standard to wrap around, so the code is perfectly well-defined.

That said you can still make clarity arguments for changing it. At the very least add a comment stating how it works for future maintainers.

rici · Answer

TL;DR: the code is almist certainly safe, but you could make it bulletproof by changing pos + 1 to pos + 1U. That will avoid an obscure corner of the integer promotion rules which could only be triggered by a perverse (but legal) implementation.

Thanks to @T.C. for a useful discussion which helped me remove a number of irrelevant sidetracks from this answer.

basic_string::find_last_of has return type size_type ([string.find.last.of] 24.3.2.7.5/p1) which is an unsigned integer type ([allocator.requirements] 20.5.3.5/Table 31). (In fact, this is almost certainly std::size_t since std::string uses the default allocator, whose size_type type member is std::size_t, according to [depr.default.allocator] D.10/p1.)
The value returned by basic_string::find_last_of in the case that the searched object is not found is basic_string::npos ([string.find.last.of] 24.3.2.7.5/p2), which is size_type(-1) ([basic.string] 24.3.2/p5.1).
It is highly likely (but not guaranteed) that size_type will have an integer rank greater than or equal to int. If that is the case, then in the computation pos + 1, the 1 will be converted to size_type prior to doing the addition, and the addition will then be between unsigned integer values. The rules for unsigned integer arithmetic guarantee that the result of adding an unsigned 1 to an unsigned -1 must be 0.
If size_type's rank is less than that of int, then the result will depend on the relationship between numeric_limits::max<size_type>() (which is precisely the value of npos) and numeric_limits::max<int>():
1. If numeric_limits::max<size_type>() > numeric_limits::max<int>() (which implies that size_type and unsigned int are the same size, even though they are different types), then pos will be "promoted" to an unsigned int and then the 1 will also be converted to unsigned int. The result will be the same as above.
2. If numeric_limits::max<size_type>() < numeric_limits::max<int>(), then pos will be promoted to a (signed) int, and the addition will be signed. But given the inequality, pos + 1 cannot overflow. The result could be too large to fit into a size_type, of course, but the conversion from signed to unsigned integer types is well-defined and if pos == npos, the result is guaranteed to be 0.
3. That leaves only the case where numeric_limits::max<size_type>() == numeric_limits::max<int>(). One theoretical possibility for this equality would be that size_type is precisely one bit shorter than an int. Such an implementation would be weird, to say the least, but it is still legal. In this case, pos will be promoted to (signed) int, 1 will remain a signed int, and if pos == npos, the signed addition pos + 1 will overflow, creating Undefined Behaviour.
To summarize, the computation is unsafe if size_type has a smaller rank than int but has the same maximum value. I have a hard time believing this situation would ever occur in real life but I believe that it would not violate the C++ standard.

To protect yourself from the above extremely unlikely case, it is sufficient to change the expression to pos + 1U. That will guarantee that pos is converted (if necessary) to an unsigned int to match 1U, after which the addition will be performed as unsigned arithmetic with no possibility of UB, and the result when pos == npos (after truncation to the size of size_type) will be 0.

JoeMo · Answer

JoeMo:

I thank you for your thoughtful and well-researched responses to my question.

I actually did change it (along the lines suggested by milleniumbug:(if(pos == std::string::npos) return path; etc.). Also, I agree with him or her that the code looks wrong.

The reason why the code looks wrong is also why it seemed 'elegant'; one pass through the code handles all the different cases. There are no if statements etc., just two straight statements. Can't be that simple, can it?

But it accomplishes this feat of simplification by making use of specialized knowledge at different levels of abstraction (bit-wize knowledge of the string::npos value, the way that substr works in edge-cases, etc.).

Better to keep things at the same level of abstraction, even at the cost of a little extra code.

Thanks.

Should I fix this obscure but elegant c++ code fragment? [closed]

Tags:

c++

language-lawyer

JoeMo

3 Answers

Mark B

rici

JoeMo

Recent Activity

Donate For Us

Should I fix this obscure but elegant c++ code fragment? [closed]

Tags:

c++

language-lawyer

JoeMo

3 Answers

Mark B

rici

JoeMo

Related questions

Recent Activity

Donate For Us