Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case-insensitive string comparison in C++ [closed]

Tags:

c++

string

People also ask

How do you compare strings case insensitive?

The equalsIgnoreCase() method of the String class is similar to the equals() method the difference if this method compares the given string to the current one ignoring case.

Is string compare case-sensitive C?

It is case-insensitive.

Can I use == to compare strings in C?

In C, string values (including string literals) are represented as arrays of char followed by a 0 terminator, and you cannot use the == operator to compare array contents; the language simply doesn't define the operation.

What is case insensitive in C?

C Program Case Insensitive String Comparison USING stricmp() built-in string function. /* C program to input two strings and check whether both strings are the same (equal) or not using stricmp() predefined function. stricmp() gives a case insensitive comparison.


Boost includes a handy algorithm for this:

#include <boost/algorithm/string.hpp>
// Or, for fewer header dependencies:
//#include <boost/algorithm/string/predicate.hpp>

std::string str1 = "hello, world!";
std::string str2 = "HELLO, WORLD!";

if (boost::iequals(str1, str2))
{
    // Strings are identical
}

Take advantage of the standard char_traits. Recall that a std::string is in fact a typedef for std::basic_string<char>, or more explicitly, std::basic_string<char, std::char_traits<char> >. The char_traits type describes how characters compare, how they copy, how they cast etc. All you need to do is typedef a new string over basic_string, and provide it with your own custom char_traits that compare case insensitively.

struct ci_char_traits : public char_traits<char> {
    static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
    static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
    static bool lt(char c1, char c2) { return toupper(c1) <  toupper(c2); }
    static int compare(const char* s1, const char* s2, size_t n) {
        while( n-- != 0 ) {
            if( toupper(*s1) < toupper(*s2) ) return -1;
            if( toupper(*s1) > toupper(*s2) ) return 1;
            ++s1; ++s2;
        }
        return 0;
    }
    static const char* find(const char* s, int n, char a) {
        while( n-- > 0 && toupper(*s) != toupper(a) ) {
            ++s;
        }
        return s;
    }
};

typedef std::basic_string<char, ci_char_traits> ci_string;

The details are on Guru of The Week number 29.


The trouble with boost is that you have to link with and depend on boost. Not easy in some cases (e.g. android).

And using char_traits means all your comparisons are case insensitive, which isn't usually what you want.

This should suffice. It should be reasonably efficient. Doesn't handle unicode or anything though.

bool iequals(const string& a, const string& b)
{
    unsigned int sz = a.size();
    if (b.size() != sz)
        return false;
    for (unsigned int i = 0; i < sz; ++i)
        if (tolower(a[i]) != tolower(b[i]))
            return false;
    return true;
}

Update: Bonus C++14 version (#include <algorithm>):

bool iequals(const string& a, const string& b)
{
    return std::equal(a.begin(), a.end(),
                      b.begin(), b.end(),
                      [](char a, char b) {
                          return tolower(a) == tolower(b);
                      });
}

If you are on a POSIX system, you can use strcasecmp. This function is not part of standard C, though, nor is it available on Windows. This will perform a case-insensitive comparison on 8-bit chars, so long as the locale is POSIX. If the locale is not POSIX, the results are undefined (so it might do a localized compare, or it might not). A wide-character equivalent is not available.

Failing that, a large number of historic C library implementations have the functions stricmp() and strnicmp(). Visual C++ on Windows renamed all of these by prefixing them with an underscore because they aren’t part of the ANSI standard, so on that system they’re called _stricmp or _strnicmp. Some libraries may also have wide-character or multibyte equivalent functions (typically named e.g. wcsicmp, mbcsicmp and so on).

C and C++ are both largely ignorant of internationalization issues, so there's no good solution to this problem, except to use a third-party library. Check out IBM ICU (International Components for Unicode) if you need a robust library for C/C++. ICU is for both Windows and Unix systems.


Are you talking about a dumb case insensitive compare or a full normalized Unicode compare?

A dumb compare will not find strings that might be the same but are not binary equal.

Example:

U212B (ANGSTROM SIGN)
U0041 (LATIN CAPITAL LETTER A) + U030A (COMBINING RING ABOVE)
U00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE).

Are all equivalent but they also have different binary representations.

That said, Unicode Normalization should be a mandatory read especially if you plan on supporting Hangul, Thaï and other asian languages.

Also, IBM pretty much patented most optimized Unicode algorithms and made them publicly available. They also maintain an implementation : IBM ICU


boost::iequals is not utf-8 compatible in the case of string. You can use boost::locale.

comparator<char,collator_base::secondary> cmpr;
cout << (cmpr(str1, str2) ? "str1 < str2" : "str1 >= str2") << endl;
  • Primary -- ignore accents and character case, comparing base letters only. For example "facade" and "Façade" are the same.
  • Secondary -- ignore character case but consider accents. "facade" and "façade" are different but "Façade" and "façade" are the same.
  • Tertiary -- consider both case and accents: "Façade" and "façade" are different. Ignore punctuation.
  • Quaternary -- consider all case, accents, and punctuation. The words must be identical in terms of Unicode representation.
  • Identical -- as quaternary, but compare code points as well.