Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is it possible to make std::string always hold a lower-case string?

is it possible to make std::string always hold a lower-case string? here's how I would use it:

typedef std::basic_string<...> lowercase_string;

void myfunc()
{
  lowercase_string s = "Hello World"; // notice mixed case
  printf(s.c_str());                  // prints "hello world" in lowercase
  std::string s2 = s;
  printf(s2.c_str());                 // prints "hello world" in lowercase
}
like image 815
Aviad Rozenhek Avatar asked Jan 31 '13 12:01

Aviad Rozenhek


People also ask

How do you make a string all lowercase in C++?

C++ String to Lowercase C++ String has got built-in tolower() function to convert the input string to lowercase.

Does ToUpper work on strings?

ToUpper() Returns a copy of this string converted to uppercase.

How do I ignore case sensitive in CPP?

Case-insensitive string comparison in C++ Here the logic is simple. We will convert the whole string into lowercase or uppercase strings, then compare them, and return the result. We have used the algorithm library to get the transform function to convert the string into lowercase string.

Is std::string movable?

Yes, std::string (since C++11) is able to be moved i.e. it supports move semantics.


3 Answers

You can write your own char traits and pass it to std::basic_string as second template argument.

Here is a minimal example:

template<typename T>
struct lowercase_char_traits : std::char_traits<T>
{
    static T* copy(T* dest, const T* src, std::size_t count )
    {
         for(size_t i = 0 ; i < count ; ++i)
              dest[i] = std::tolower(src[i]);
         return dest;
    }
    static void assign(T & out, T in)
    {
       out = std::tolower(in);
    }
    //implement other overload of assign yourself

    //note that you may have to implement other functionality 
    //depending on your requirement
};

And then define a typedef as:

typedef std::basic_string<char, lowercase_char_traits<char>> lowercase;

And here is a test program:

int main() 
{
    lowercase s1 = "Hello World";
    std::cout << s1.c_str() << std::endl;

    lowercase s2 = "HELLO WORLD";
    std::cout << std::boolalpha << (s1 == s2) << std::endl;

    lowercase s3 = "HELLO";
    s3 += " WorL";
    s3.append("D");
    std::cout << std::boolalpha << (s1 == s3) << std::endl;

    std::cout << s2.c_str() << std::endl;
    std::cout << s3.c_str() << std::endl;
}

Output:

hello world
true
true
hello world
hello world

Cool, isn't it?

  • Online demo

Note that to have a fully-working lowercase string class, you may need to define other functionality of lowercase_char_traits also, depending on what behavior you want out of such class.

Have a look at the Herb Sutter brilliant article for details and explanation:

  • So you want a case-insensitive string class? Your mission, should you choose to accept it, is to write one.

Hope that helps.

like image 50
Nawaz Avatar answered Sep 19 '22 15:09

Nawaz


You could use private inheritance. This would free you from writing a bunch of wrapper methods.

class lower_case_string : private std::string
{
    // define constructors that do the conversion to lower case
    // ...

    // expose functionality from std::string
    using std::string::size;
    using std::string::length;
    using std::string::cbegin;
    // etc.

    // Make sure NOT to expose methods that allow modification as they
    // could violate your invariant that all characters are lower case.
    // E.g., don't expose std::string::begin, instead write your own.
};
like image 41
Tobias Brandt Avatar answered Sep 21 '22 15:09

Tobias Brandt


std::string itself does not.

There are various alternative, more or less elegant with and more or less pros & cons. Let me try to compare them

wrap by encapsulation

Probably the most clean solution: create a class that contains an std::string, and that can take ctors and assignments that perform the conversion.

  • problem: std::sting has has more that one-hundred methods: if you want your class to expose all of them... be prepared to write all hose functions just to call the wrapped ones. This is a clean "prductivity problem" no OOP zealot seems to take care of... may be thay are payed by typed character ... :-)
  • advantage: no runtime polymorphism (not supported by std::string) can accidentally work, so the code is safer.

"partial" wrap

Same as before, but related only to some important methods or requiring some explicit coding.

A typical implementain can be:

can be:

class llstring
{
public:
    //just esplicitate a default
    llstring() :m() {}

    //this wors for all the std::string contructors but the ones specifically defined here
    template<class T, class... TT>
    llstring(T&& t, TT&&... tt) :m(std::forward<T>(t), std::forward<TT>(tt)...)
    {}

    // copy and move defaulted: just call the memebr ones
    llstring(const llstring&)=default;
    llstring(llstring&&) =default;

    //impose conversion
    llstring(const std::string& s) :m(lowercase(s)) {}
    llstring(const char* s) :m(lowercase(s)) {}

    //assign and transfer defaulted
    llstring& operator=(const llstring&)=default;
    llstring& operator=(llstring&&)=default;

    //impose conversion
    llstring& operator=(const std::sting& s) { m = lowercase(s); return *this; }
    llstring& operator=(const char* s) { m = lowercase(s); return *this; }

    //gets the "value"
    const std::string& str() const { return m; } 

private:
    std::string m;
};

This class is itself incapable of any algorithm and operation, but can participate to watever std::string stuff by a call to str(). And can accept whatever std::string result, acquiring by conversion.

Probably a good compromise between recoding and maintenance risks

Inherit

std::string as a base, instead of a member. The code is similar as above (you have to provide a way to convert on construct or assign)

  • advantages: the interface and behavior of the orioginal std::string is automatically exposed, so all the std::string methods work and are accessible.

  • neutral: both the conversion forward (by design) and backward (by base inheritache) from std::string work. This may lead to some ambiguity with certain operations that may not go through llstring. Not by itself a problem, but you must be shure about how function name resolution and binding is done. The laguage is well specified, but is one of the side of the language not always known to any averaged programmer.

  • disavantage: llstring expose a polymorhic behavior respect to std::string that does not behave polimorphically respect to llstring (no methods are virtual, including the destructor), hence you must never call delete on std::string* (it is Undefined Behavior, if it points to an 'llstring`).

Considering that both llstring and string are value types, this should not normally happen (in 30 years I never wrote a single new std::string or delete pstring). But this will in any case catch all the rants of OOP zealot pretending classic OOP rules to applòy to string-s as well even if they are not OOP object.

But there is another -IMHO more subtle- risk: in a compound expression between llstring and string, all intermediate results will be string. And intermediate operation will not convert in-between. And all of that is implicit. Again, the language spec are well defined, but may be not easy to take control of everything. A search upon an intermediate result not yet assigned may fail ... because of an unexpected capital letter inside.

Back conversion

Not exactly what you asked but... may be inverting the problem better suit.

Instead of "convert when reacing the destination", "convert when leaving the source":

write a wrapper (like in "partial warp", above) that, instead of taking the conversion implicitly from string, and having an explicit str() function, takes an explicit construction fromn string (even with no conversion) and have an implicit conversion to string (operator std::string() { return lowercase(m); })

This works the inverse as what you asked. It will be good if the number of point where capitalized strings are permitted to exist is little respect the total strings in your program (that you can assume always lowercase) and if you can grant that all the std::string operation you may implement between lower case string values will never generate an uppercase one.

Edit: the char_traits solution

Added after the Nawaz post:

The solution try to change the behavior (not the value) by making the char to adhere to another semantics.

  • Advantage: Simple and not requiring big wrappers. Fast to code.
  • Disavantage: may be not fully what is intended: since std::string functions are all accessible and since copy may be not the only way to change a string content, you are not granted (in any case) that there never be capitalized chars in it. Unless you can grant that copy will be the only way to alter a string value.

Note: just like string, also char_traits have no virtual destrcuctor, but, unlike with string, no OOP zealot usually shout about the inheritance from it. And if asked, will most likely say "there will be no dynamic allocation on char_traits". Bye bye coherence.

In conclusion

There is no "perfect solution" with "low cost". All of them are somehow inperfect at some stage

like image 35
Emilio Garavaglia Avatar answered Sep 19 '22 15:09

Emilio Garavaglia