how could I test a string against only valid characters like letters a-z?...
string name;
cout << "Enter your name"
cin >> name;
string letters = "qwertyuiopasdfghjklzxcvbnm";
string::iterator it;
for(it = name.begin(); it = name.end(); it++)
{
size_t found = letters.find(it);
}
To check whether a String contains only unicode letters or digits in Java, we use the isLetterOrDigit() method and charAt() method with decision-making statements. The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit.
Extract alphabets from a string using regex You can use the regular expression 'r[^a-zA-Z]' to match with non-alphabet characters in the string and replace them with an empty string using the re. sub() function. The resulting string will contain only letters.
First, using std::cin >> name
will fail if the user enters John Smith
because >>
splits input on whitespace characters. You should use std::getline()
to get the name:
std::getline(std::cin, name);
There are a number of ways to check that a string contains only alphabetic characters. The simplest is probably s.find_first_not_of(t)
, which returns the index of the first character in s
that is not in t
:
bool contains_non_alpha
= name.find_first_not_of("abcdefghijklmnopqrstuvwxyz") != std::string::npos;
That rapidly becomes cumbersome, however. To also match uppercase alphabetic characters, you’d have to add 26 more characters to that string! Instead, you may want to use a combination of find_if
from the <algorithm>
header and std::isalpha
from <cctype>
:
#include <algorithm>
#include <cctype>
struct non_alpha {
bool operator()(char c) {
return !std::isalpha(c);
}
};
bool contains_non_alpha
= std::find_if(name.begin(), name.end(), non_alpha()) != name.end();
find_if
searches a range for a value that matches a predicate, in this case a functor non_alpha
that returns whether its argument is a non-alphabetic character. If find_if(name.begin(), name.end(), ...)
returns name.end()
, then no match was found.
To do this as a one-liner, you can use the adaptors from the <functional>
header:
#include <algorithm>
#include <cctype>
#include <functional>
bool contains_non_alpha
= std::find_if(name.begin(), name.end(),
std::not1(std::ptr_fun((int(*)(int))std::isalpha))) != name.end();
The std::not1
produces a function object that returns the logical inverse of its input; by supplying a pointer to a function with std::ptr_fun(...)
, we can tell std::not1
to produce the logical inverse of std::isalpha
. The cast (int(*)(int))
is there to select the overload of std::isalpha
which takes an int
(treated as a character) and returns an int
(treated as a Boolean).
Or, if you can use a C++11 compiler, using a lambda cleans this up a lot:
#include <cctype>
bool contains_non_alpha
= std::find_if(name.begin(), name.end(),
[](char c) { return !std::isalpha(c); }) != name.end();
[](char c) -> bool { ... }
denotes a function that accepts a character and returns a bool
. In our case we can omit the -> bool
return type because the function body consists of only a return
statement. This works just the same as the previous examples, except that the function object can be specified much more succinctly.
In C++11 you can also use a regular expression to perform the match:
#include <regex>
bool contains_non_alpha
= !std::regex_match(name, std::regex("^[A-Za-z]+$"));
None of these solutions addresses the issue of locale or character encoding! For a locale-independent version of isalpha()
, you’d need to use the C++ header <locale>
:
#include <locale>
bool isalpha(char c) {
std::locale locale; // Default locale.
return std::use_facet<std::ctype<char> >(locale).is(std::ctype<char>::alpha, c);
}
Ideally we would use char32_t
, but ctype
doesn’t seem to be able to classify it, so we’re stuck with char
. Lucky for us we can dance around the issue of locale entirely, because you’re probably only interested in English letters. There’s a handy header-only library called UTF8-CPP that will let us do what we need to do in a more encoding-safe way. First we define our version of isalpha()
that uses UTF-32 code points:
bool isalpha(uint32_t c) {
return (c >= 0x0041 && c <= 0x005A)
|| (c >= 0x0061 && c <= 0x007A);
}
Then we can use the utf8::iterator
adaptor to adapt the basic_string::iterator
from octets into UTF-32 code points:
#include <utf8.h>
bool contains_non_alpha
= std::find_if(utf8::iterator(name.begin(), name.begin(), name.end()),
utf8::iterator(name.end(), name.begin(), name.end()),
[](uint32_t c) { return !isalpha(c); }) != name.end();
For slightly better performance at the cost of safety, you can use utf8::unchecked::iterator
:
#include <utf8.h>
bool contains_non_alpha
= std::find_if(utf8::unchecked::iterator(name.begin()),
utf8::unchecked::iterator(name.end()),
[](uint32_t c) { return !isalpha(c); }) != name.end();
This will fail on some invalid input.
Using UTF8-CPP in this way assumes that the host encoding is UTF-8, or a compatible encoding such as ASCII. In theory this is still an imperfect solution, but in practice it will work on the vast majority of platforms.
I hope this answer is finally complete!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With