Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all commas, dots and lowercase the string with single iteration

In my C++ application I need to remove all dots, commas, exclamation marks and to lower case the string. So far I figured out I can do it with std::erase and std::remove like this:

string content = "Some, NiceEeeE text ! right HeRe .";  

content.erase(std::remove(content.begin(), content.end(), ','), content.end());
content.erase(std::remove(content.begin(), content.end(), '.'), content.end());
content.erase(std::remove(content.begin(), content.end(), '!'), content.end());
std::transform(content.begin(), content.end(), content.begin(), ::tolower);

So my question is can I do this without iterating 4 times throught the string? Are there better ways to do this with simple C++?

like image 489
Deepsy Avatar asked Apr 26 '14 21:04

Deepsy


4 Answers

Ignoring iterations performed inside std::remove and erase (which you already do), you can use std::remove_if and provide your own custom predicate:

#include <algorithm>

content.erase(std::remove_if(content.begin(), 
                             content.end(), 
                             [](char c) 
                             { return c==','||c=='.'|| c=='!'; }
              content.end());

Then you can then use std::transform to transform the remaining string to lower case:

#include <cctype>
#include <algorithm>

std::transform(contents.begin(),
               contents.end(),
               contents.begin(),
               [] (unsigned char c) { return std::tolower(c); }));
like image 187
juanchopanza Avatar answered Nov 20 '22 07:11

juanchopanza


Try this

string result;
for (int loop = 0; loop < content.length(); ++loop) {
     switch (content[loop]) {
        case ',':
        case '!':
        case '.':
            break;
        default:
           result += static_case<unsigned char>(tolower(content[loop]));
     }
}
like image 32
Ed Heal Avatar answered Nov 20 '22 09:11

Ed Heal


This sounds like a conditional std::transform so you could do:

template <typename InIt, typename OutIt, typename UnOp, typename Pred>
OutIt transform_if(InIt first, InIt last, OutIt dest, UnOp op, Pred pr)
{
    while (first != last) {
        if (pr(*first)) {
            *dest = op(*first);
            ++dest;
        }
        ++first;
    }
    return dest;
}

Usage in this case would be:

content.erase(transform_if(
    content.begin(), content.end(),
    content.begin(),
    [](char c){ return std::tolower(c, std::locale()); },
    [](char c){ return !(c == ',' || c == '.'); }
), content.end());
like image 34
Blastfurnace Avatar answered Nov 20 '22 07:11

Blastfurnace


If you want to do this in a single pass, it's pretty easy to do with a standard for loop. Using standard library routines might be preferred in general, but if you want it done in a single pass and there's not a good fit in the library, then I see no harm in just using a loop.

#include <iostream>
#include <ostream>
#include <string>

using namespace std;

int main()
{
    string exclude_chars(",.!");
    string content = "Some, NiceEeeE text ! right HeRe .";  

    auto write_iter = content.begin();

    for (auto read_iter = content.begin(); read_iter != content.end(); ++read_iter) {
        auto c = *read_iter;

        if (exclude_chars.find(c) != string::npos) continue;

        *write_iter = tolower( (unsigned char) c);
        ++write_iter;
    }

    content.erase(write_iter, content.end());

    cout << content << endl;
}

If you need this functionality in more than one pace and/or need the exclusion characters or transformation to be parameterized, then its also pretty easy to turn that snippet of code into a function that takes those things as argument.

For example, here's a template function that does the filter and transform in one pass:

#include <ctype.h>
#include <iostream>
#include <ostream>
#include <string>

template <class InputIter, class OutputIter, class UnaryOp, class UnaryPred>
OutputIter filter_and_transform(
                    InputIter first, 
                    InputIter last,
                    OutputIter result, 
                    UnaryPred pred,
                    UnaryOp op)
{
    while (first!=last) {
        if (pred(*first)) {
            *result = op(*first);
            ++result;
        }
        ++first;
    }

    return result;
}


int main()
{
    std::string exclude_chars(",.!");
    std::string content = "Some, NiceEeeE text ! right HeRe .";  

    content.erase( 
        filter_and_transform( begin(content), end(content), 
                              begin(content),
                              [](char c) {
                                    return std::string(",.!").find(c) == std::string::npos;
                              },
                              [](char c) -> char {
                                    return tolower((unsigned char) c);
                              }),
        end(content)
     );

    std::cout << content << std::endl;
}

It's more generic, but I'm not convinced it's more readable.


Update (29 Apr 2014)

I decided to play around with the idea of having a custom filter_iterator<> perform the filtering, and when I got frustrated over the amount of boilerplate code I had to get working I figured I'd look into whether Boost had anything similar. Sure enough boost has exactly that data type and a transform_iterator that can be composed together to get the following alternate single pass filter-and-transform operation:

// boost::transform_iterator<> might need the following define
//  in order to work with lambdas (see http://stackoverflow.com/questions/12672372)
#define BOOST_RESULT_OF_USE_DECLTYPE

#include <algorithm>
#include <ctype.h>
#include <iostream>
#include <ostream>
#include <string>

#include "boost/iterator/filter_iterator.hpp"
#include "boost/iterator/transform_iterator.hpp"

/*
    relaxed_copy<>() works like std::copy<>() but is safe to use in 
    situations where result happens to be equivalent to first.

    std::copy<> requires that result not be in the range [first,last) - it's
    understandable that result cannot be in the range [first,last) in general,
    but it should be safe for the specific situation where result == first.
    However, the standard doesn't allow for this particular exception, so 
    relaxed_copy<>() exists to be able to safely handle that scenario.

*/
template <class InputIter, class OutputIter>
OutputIter relaxed_copy(
                InputIter first, 
                InputIter last,
                OutputIter result)
{
    while (first!=last) {
        *result = *first;
        ++first;
        ++result;
    }

    return result;
}


int main()
{
    std::string exclude_chars(",.!");
    std::string content = "Some, NiceEeeE text ! right HeRe .";  

    // set up filter_iterators over the string to filter out ",.!" characters
    auto filtered_first = 
        boost::make_filter_iterator(
            [&exclude_chars](char c) {
                return exclude_chars.find(c) == std::string::npos;
            },
            begin(content),
            end(content)
        );

    auto filtered_last = 
        boost::make_filter_iterator( 
            filtered_first.predicate(), 
            end(content)
        );

    // set up transform_iterators 'on top of' the filter_iterators
    //  to transform the filtered characters to lower case
    auto trans_first = 
        boost::make_transform_iterator( 
            filtered_first, 
            [](char c) -> char {
                return tolower((unsigned char) c);
            }
        );

    auto trans_last  = 
        boost::make_transform_iterator( 
            filtered_last, 
            trans_first.functor()
        );

     // now copy using the composed iterators and erase any leftovers
     content.erase( 
        relaxed_copy( trans_first, trans_last, begin(content)),
        end(content)
     );


    std::cout << content << std::endl;
}

I think this is a pretty nifty technique, but I still think it might be hard to argue that it's understandable at a glance what's going on.

like image 1
Michael Burr Avatar answered Nov 20 '22 08:11

Michael Burr