I am working through a book on C++ and it just covered using string::npos
to check if a character position exists in a string. I don't understand how this mechanism could possibly know which string I'm referring to, though! This specific code is counting the number of occurrences of a substring.
Code:
for (int i=cats.find("cat",0);i!=string::npos;i=cats.find("cat",i)) {
++catCount;
++i;
}
I understand that it is starting the loop at the first occurrence of the word, incrementing the counter every pass through to avoid counting the same substring twice, and then at the end of each loop the counter is jumping to the position of the next occurrence of the substring. The loop stops when the counter does not exist as a character index for the string.
The string is called cats though, and "cats" is nowhere to be found in "string::npos" so how in the heck does it know that's the variable I'm even searching? Is it simply because that was the last variable to call .find()
?
Thanks!
What is string::npos: It is a constant static member value with the highest possible value for an element of type size_t. It actually means until the end of the string. It is used as the value for a length parameter in the string ’s member functions. As a return value, it is usually used to indicate ...
Where, npos is constant static value with the highest possible value for an element of type size_t and it is defined with -1. Program 1: Below is the C++ program to illustrate the use of string::npos:
If the position returned in the “n” variable doesn’t match with the strng::npos, it means the value has been found in the “n” position, and the cout statement will display that value. On the other hand, if the value has not been found through the find () function, the value returned in variable “n” and string::npos will become equal.
Let’s save our code with Ctrl+S and quit with Ctrl+X. The file npos.cc is compiled using the “g++” compiler of Ubuntu 20.04. The execution shows that the substring is found at the 5 th index of variable s1, and “n” is not equal to “string::npos”.
If find
cannot find what you are looking for, it returns a sentinel value, std::string::npos
. There is no need to know anything about the string itself. All that is needed is to return a value which cannot be a valid index.
For example, it could be implemented as:
static const size_t npos = std::numeric_limits<size_t>::max();
size_t string::find(...)
{
// if we didn't find it...
return npos;
}
Also, you should not be using an int
to store the return value as it is not what find
returns. What if you have a very long string and the index returned is > numeric_limits<int>.max()
? Well now you have invoked undefined behavior.
find
returns a std::string::size_type
. std::string::npos
is a constant of that type which is returned when the value cannot be found.
Note that std::string::size_type
is an unsigned
value, and int
is signed. If std::string::npos
cannot be represented as an int
, then the conversion from std::string::npos
to int
is undefined behavior.
So you really shouldn't store the return value of std::string::find
in an int
. Instead, you should store it in a std::string::size_type
, or in C++11 use auto
. baisc_string<char>
size_type
is std::size_t
, as is most other specializations.
It does not know what string you are referring to, npos
is just a static const member that represents the maximum value representable and in this case represents and eror, if we look at what cppreference says about std::basic_string::npos:
static const size_type npos = -1;
This is a special value equal to the maximum value representable by the type size_type. The exact meaning depends on context, but it is generally used either as end of string indicator by the functions that expect a string index or as the error indicator by the functions that return a string index.
which matches the definition for npos
in the draft C++ standard in section 21.4
Class template basic_string paragraph 5:
static const size_type npos = -1;
which is a bit odd since since size_type is unsigned but is due to the integral conversion rules in section 4.7
Integral conversions which says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).[...]
guarantees that -1
will be converted to the largest unsigned value. It may be easier to see using the wording from the draft C99 standard which says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
which gives us MAX + 1 -1
which is MAX
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With