Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validity of pointer returned by operator->

I'm implementing a two-dimensional array container (like boost::multi_array<T,2>, mostly for practice). In order to use double-index notation (a[i][j]), I introduced a proxy class row_view (and const_row_view but I'm not concerned about constness here) which keeps a pointer to the beginning and end of the row.

I would also like to be able to iterate over rows and over elements within a row separately:

matrix<double> m;
// fill m
for (row_view row : m) {
    for (double& elem : row) {
        // do something with elem
    }
}

Now, the matrix<T>::iterator class (which is meant to iterate over rows) keeps a private row_view rv; internally to keep track of the row the iterator is pointing to. Naturally, iterator also implements dereferenciation functions:

  • for operator*(), one would usually want to return a reference. Instead, here the right thing to do seems to return a row_view by value (i.e. return a copy of the private row_view). This ensures that when the iterator is advanced, the row_view still points to the previous row. (In a way, row_view acts like a reference would).
  • for operator->(), I'm not so sure. I see two options:

    1. Return a pointer to the private row_view of the iterator:

      row_view* operator->() const { return &rv; }
      
    2. Return a pointer to a new row_view (a copy of the private one). Because of storage lifetime, that would have to be allocated on the heap. In order to ensure clean-up, I'd wrap it in a unique_ptr:

      std::unique_ptr<row_view> operator->() const {
          return std::unique_ptr<row_view>(new row_view(rv));
      }
      

Obviously, 2 is more correct. If the iterator is advanced after operator-> is called, the row_view that is pointed to in 1 will change. However, the only way I can think of where this would matter, is if the operator-> was called by its full name and the returned pointer was bound:

matrix<double>::iterator it = m.begin();
row_view* row_ptr = it.operator->();
// row_ptr points to view to first row
++it;
// in version 1: row_ptr points to second row (unintended)
// in version 2: row_ptr still points to first row (intended)

However, this is not how you'd typically use operator->. In such a use case, you'd probably call operator* and keep a reference to the first row. Usually, one would immediately use the pointer to call a member function of row_view or access a member, e.g. it->sum().

My question now is this: Given that the -> syntax suggests immediate use, is the validity of the pointer returned by operator-> considered to be limited to that situation, or would a safe implementation account for the above "abuse"?

Obviously, solution 2 is way more expensive, as it requires heap-allocation. This is of course very much undesirable, as dereferenciation is quite a common task and there is no real need for it: using operator* instead avoids these problems as it returns a stack-allocated copy of the row_view.

like image 665
Jonas Greitemann Avatar asked Jun 26 '17 11:06

Jonas Greitemann


1 Answers

As you know, operator-> is applied recursively on the functions return type until a raw pointer is encountered. The only exception is when it's called by name like in your code sample.

You can use that to your advantage and return a custom proxy object. To avoid the scenario in your last code snippet, this object needs to satisfy several requirements:

  1. Its type name should be private to the matrix<>::iterator, so outside code could not refer to it.

  2. Its construction/copy/assignment should be private. matrix<>::iterator will have access to those by virtue of being a friend.

An implementation will look somewhat like this:

template <...>
class matrix<...>::iterator {
private:
  class row_proxy {
    row_view *rv_;
    friend class iterator;
    row_proxy(row_view *rv) : rv_(rv) {}
    row_proxy(row_proxy const&) = default;
    row_proxy& operator=(row_proxy const&) = default;
  public:
    row_view* operator->() { return rv_; }
  };
public:
  row_proxy operator->() {
    row_proxy ret(/*some row view*/);
    return ret;
  }
};

The implementation of operator-> returns a named object to avoid any loopholes due to guaranteed copy elision in C++17. Code that use the operator inline (it->mem) will work as before. However, any attempt to call operator->() by name without discarding the return value, will not compile.

Live Example

struct data {
    int a;
    int b;
} stat;

class iterator {
    private:
      class proxy {
        data *d_;
        friend class iterator;
        proxy(data *d) : d_(d) {}
        proxy(proxy const&) = default;
        proxy& operator=(proxy const&) = default;
      public:
        data* operator->() { return d_; }
      };
    public:
      proxy operator->() {
        proxy ret(&stat);
        return ret;
      }
};


int main()
{
  iterator i;
  i->a = 3;

  // All the following will not compile
  // iterator::proxy p = i.operator->();
  // auto p = i.operator->();
  // auto p{i.operator->()};
}

Upon further review of my suggested solution, I realized that it's not quite as fool-proof as I thought. One cannot create an object of the proxy class outside the scope of iterator, but one can still bind a reference to it:

auto &&r = i.operator->();
auto *d  = r.operator->();

Thus allowing to apply operator->() again.

The immediate solution is to qualify the operator of the proxy object, and make it applicable only to rvalues. Like so for my live example:

data* operator->() && { return d_; }

This will cause the two lines above to emit an error again, while the proper use of the iterator still works. Unfortunately, this still doesn't protect the API from abuse, due to the availability of casting, mainly:

auto &&r = i.operator->();
auto *d  = std::move(r).operator->();

Which is a death blow to the whole endeavor. There is no preventing this.

So in conclusion, there is no protection from a direction call to operator-> on the iterator object. At the most, we can only make the API really hard to use incorrectly, while the correct usage remains easy.

If creation of row_view copies is expansive, this may be good enough. But that is for you to consider.

Another point for consideration, which I haven't touched on in this answer, is that the proxy could be used to implement copy on write. But that class could be just as vulnerable as the proxy in my answer, unless great care is taken and fairly conservative design is used.

like image 102
StoryTeller - Unslander Monica Avatar answered Oct 20 '22 17:10

StoryTeller - Unslander Monica