Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11 using std::equal_range with custom comparison function

Tags:

c++

c++11

Consider this example

(Note that this is just something I made up to illustrate the problem. I am well aware there are more efficient ways to parse an arithmetic expression and though the subject is fascinating, this has nothing to do with my actual question. It's just a semi-realistic example, if I might say so.
I agree the parser thing might make the question seem more complicated, but I could not think of a more abstract example).

Assume you want to do a simple expression parser. You will get bits of strings from a tokenizer, some of them being possibly ambiguous.

For instance, the string "-" could represent an unary minus or a binary minus.

Assume you want to get all possible meanings for string "-".

You could do as follows:

1) define a sorted array describing all possible operators

// types of operators
enum class opType: char { unary, lasso, rasso, none };

// operator descriptors
struct opDesc {
    string symbol;
    opType type;
    char   priority;

    // partial order comparison
    bool operator< (const opDesc& a) const
    {
        // unary operators first
        if (symbol == a.symbol) return type < a.type;
        return symbol < a.symbol;
    }

    // comparison with strings
    static bool comp_desc_str (const opDesc& a, const string& s)
    {
        return a.symbol < s;
    }
    static bool comp_str_desc (const string& s, const opDesc& a)
    {
        return s < a.symbol;
    }
};

static opDesc op_descriptors[] = {
    { "+" , opType::unary, 8 }, // unary +
    { "-" , opType::unary, 8 }, // unary -
    { "*" , opType::lasso, 6 }, // multiplication
    { "/" , opType::lasso, 6 }, // division
    { "+" , opType::lasso, 5 }, // addition
    { "-" , opType::lasso, 5 }, // substraction
};

2) use std::equal_range to get all possible matches for a given string

// sort descriptors by value and type
sort(begin(op_descriptors), end(op_descriptors));

// do some searches
string patterns[] = { "+", "-", ">>", "**" };

for (string s : patterns)
{
    pair<opDesc*, opDesc*> ops;

    ops = equal_range(
        std::begin(op_descriptors), 
        std::end  (op_descriptors), 
        s,
        opDesc::comp_desc_str);

    cout << s <<": "<< ops.first[0] << ops.second[-1] << endl;
}

This code won't compile, complaining about opDesc::comp_desc_str (it expects parameters the other way around, i.e. string first, opDesc next).

If I try to replace the function with a version that takes its arguments in reverse order:

ops = equal_range(
    std::begin(op_descriptors), 
    std::end  (op_descriptors), 
    s,
    opDesc::comp_str_desc);

it won't compile either, complaining about parameters being yet again in the wrong order (at some other point of the algorithm).

This code, however, will work (see a live version here)

#include <regex>
#include <iostream>

using namespace std;

// types of operators
enum class opType: char { unary, lasso, rasso, none };

// operator descriptors
struct opDesc {
    string symbol;
    opType type;
    char   priority;

    // partial order comparison
    bool operator< (const opDesc& a) const
    {
        // unary operators first
        if (symbol == a.symbol) return type < a.type;
        return symbol < a.symbol;
    }

    // comparison with strings
    static bool comp_desc_str (const opDesc& a, const string& s)
    {
        return a.symbol < s;
    }
    static bool comp_str_desc (const string& s, const opDesc& a)
    {
        return s < a.symbol;
    }

    // display
    friend ostream& operator<<(ostream& os, const opDesc& op);
};

ostream& operator<<(ostream& os, const opDesc& op)
{
    os << op.symbol << "[" << (int)op.type << ":" << (int)op.priority << "]";
    return os;
}

static opDesc op_descriptors[] = {
    { "+" , opType::unary, 8 }, // unary +
    { "-" , opType::unary, 8 }, // unary -
    { "~" , opType::unary, 8 }, // bitwise not
    { "**", opType::rasso, 7 }, // power
    { "*" , opType::lasso, 6 }, // multiplication
    { "/" , opType::lasso, 6 }, // division
    { "%" , opType::lasso, 6 }, // remainder
    { "+" , opType::lasso, 5 }, // addition
    { "-" , opType::lasso, 5 }, // substraction
    { "<<", opType::lasso, 4 }, // left shift
    { ">>", opType::lasso, 4 }, // right shift
    { "&" , opType::lasso, 3 }, // bitwise and
    { "^" , opType::lasso, 2 }, // bitwise xor
    { "|" , opType::lasso, 1 }, // bitwise or 
    { "(" , opType::none , 0 }, // braces
    { ")" , opType::none , 0 }
};

int main(void)
{
    // sort descriptors by value and type
    sort(begin(op_descriptors), end(op_descriptors));

    // do some searches
    string patterns[] = { "+", "-", ">>", "**" };

    for (string s : patterns)
    {
        pair<opDesc*, opDesc*> ops;
        // this won't work
        /*
        ops = equal_range(
            std::begin(op_descriptors), 
            std::end  (op_descriptors), 
            s,
            opDesc::comp_desc_str or opDesc::comp_str_desc);
        */
        // this works
        ops.first = lower_bound(
            std::begin(op_descriptors), 
            std::end  (op_descriptors), 
            s, opDesc::comp_desc_str);
        ops.second = upper_bound(
            std::begin(op_descriptors), 
            std::end  (op_descriptors), 
            s, opDesc::comp_str_desc);
        cout << s <<": "<< ops.first[0] << ops.second[-1] << endl;
    }
}

output:

+: +[0:8]+[1:5]     // unary and binary "-" operators found
-: -[0:8]-[1:5]     // same thing for "+"
>>: >>[1:4]>>[1:4]  // first == second when there is only
**: **[2:7]**[2:7]  // one version of the operator

I tried this code on VisualC++ 2013 and g++ with the same results
(only the obfuscation of the template error messages vary).

Questions

  • is there a particular reason why lower_bound and upper_bound should require two different custom comparison functions?
  • is there a workaround for MSVC throwing a bogus error in debug build when using functors to work around the problem?
  • is there a definite workaround for this problem (i.e. using equal_range as intended instead of doing the job twice and make it compile in debug mode on Visual C++ 2013)?
like image 640
kuroi neko Avatar asked Feb 10 '14 11:02

kuroi neko


3 Answers

std::lower_bound requires comp(*it, val) whereas std::upper_bound requires comp(val, *it).

So your comp functor have to provide both bool operator () (const opDesc& a, const string& s) const and bool operator ()(const string& s, const opDesc& a) const.

So, you may use following comp functor:

struct lessOpDescWithString
{
    bool operator () (const opDesc& lhs, const std::string& rhs) const {
       return opDesc::comp_desc_str(lhs, rhs);
    }

    bool operator () (const std::string& lhs, const opDesc& rhs) const {
       return opDesc::comp_str_desc(lhs, rhs);
    }
};
like image 86
Jarod42 Avatar answered Nov 10 '22 22:11

Jarod42


I suppose that std::lower_bound and std::upper_bound will use a binary search on your sorted array and must be able to compare obDesc to std::string and vice versa. I would suggest making a comparator like

struct obDescStrCmp {
    bool operator()(const opDesc& lhs, const opDesc& rhs) const {
       // code to compare obDesc to opDesc
    }

    bool operator()(const opDesc& lhs, const std::string& rhs) const {
       // code to compare obDesc to std::string
    }

    bool operator()(const std::string& lhs, const opDesc& rhs) const {
       // code to compare std::string to opDesc
    }

    bool operator()(const std::string& lhs, const std::string& rhs) const {
       // code to compare std::string to std::string
       // I'm not sure if this is really necessary.
    }
};

and pass it to your std algorithm of choice instead of relying on the operators defined in your opDesc struct. The compiler should select the correct overload based on the actual order of parameters in the implementations of the std algorithms.

Edit: Replace operator< with operator() to make the struct callable.

like image 3
Kristian Duske Avatar answered Nov 10 '22 22:11

Kristian Duske


My own answer, only summarizing other contributions, notably Jarod42's:

Why the two comparisons

The algorithm of equal_range requires both > and < comparisons between internal (opDesc in this example) and foreign (std::string) types.

You can't deduce a<b from !(b<a), because of the == case, so you have to provide two different comparators.

Functionnally, you could pick any working combination of comparison operations, for instance < and > or < and <=, but the std:: guys settled for a fixed < comparison with arguments swapped around, which is a choice dictated by the signature of the function: it only has to define a (type, foreign type) and a (foreign type, type) variant.

lower_bound only requires < (expressed as type < foreigh type) while upper_bound only requires > (expressed as foreign type < type), so both can work with a single function, but equal_range must have access to both prototypes.

The way to do it

The practical solution is to define a function object aka functor to do the job:

// operator descriptors
struct opDesc {
    string symbol;
    opType type;
    char   priority;

    // partial order comparison
    bool operator< (const opDesc& a) const
    {
        // unary operators first
        if (symbol == a.symbol) return type < a.type;
        return symbol < a.symbol;
    }

    // functor to compare with strings
    struct comp
    {
        bool operator() (const opDesc& a, const std::string& b) const
        {
           return a.symbol < b;
        }

        bool operator() (const std::string& a, const opDesc& b) const
        {
           return a < b.symbol;
        }
    };

and use it like so:

pair<opDesc*, opDesc*> ops;
ops = equal_range(
    std::begin(op_descriptors), 
    std::end  (op_descriptors), 
    s,
    opDesc::comp()); // <- functor to provide two different comparison functions

MSVC bug

Besides, this won't compile on MSVC++ 2013 due to an obscure paranoid check enabled only in debug mode. The release version will compile fine, as will the code in g++ regardless of debug level.

Judging from the cryptic names used, it seems the template checks whether the comparison defines a total order (which it shouldn't since the whole point of this API is to work on partially ordered structures).

My current (ugly) workaround is to disable some internal debug flag:

#if (defined _MSC_VER && defined _DEBUG)
#define _ITERATOR_DEBUG_LEVEL 1
#endif // _MSC_VER && _DEBUG

prior to including std:: headers

Another possible workaround suggested by Jarod42 is to define the missing comparison function.

    // functor to compare with strings
    struct comp
    {
        bool operator() (const opDesc& a, const std::string& b)
        { return a.symbol < b; }
        bool operator() (const std::string& a, const opDesc& b)
        { return a < b.symbol; }

        // just to make Microsoft Visual C++ happy when compiling in debug mode
        bool operator() (const opDesc& a, const opDesc& b)
        { assert(false); return false; }
    };
like image 3
kuroi neko Avatar answered Nov 10 '22 21:11

kuroi neko