What is the approach to implement operator[] / iterator to deal with performance issue?

Tags:

c++

Let's say that I have a custom container MyList like following.

template<class T>
class MyList
{
public:
    const T& get (size_t index) const
    {
        Block& block = getBlock (index);
        return block.get (index - block.startIndex);
    }
    void set (size_t index, const T& value)
    {
        Block& block = getBlock (index);
        block.dirty = true;
        block.set (index - block.startIndex, value);
    }
    // ...
};

Block is basically a cached block of data loaded from disk (there can be 2-3 such LRU blocks). If a block is dirty, when it is not memory, then its data would be written to disk.

My questions are the following.

Overloading operator[] does not get the correct operator[] used. I overloaded the following two:

 T& operator[](size_t index)
 {
     Block& block = getBlock (index);
     block.dirty = true;
     return block.get (index - block.startIndex);
 }
 const T& operator[](size_t index) const
 {
     return get (index);
 }

However, it appears that g++ calls the first one if the MyList is not a const version. That is, for a code like the following:

MyList<int> list;
// ...
int myValue = list[0];

The operator[] called is always the first one which would set the dirty bit. This is not what I want.

Did I do something wrong? While I could certainly call get() or use const_iterator, my concern is that the user of the MyList would use operator[] in undesired situations. Is there a better way of operator overloading that does not have this issue?

What about iterators? I had the similar issue for operator* in iterators. That is the first one is always called if the iterator itself is not const.

T& operator* ();
const T& operator* () const;

My goal is to run C++ STL algorithms such as sort / nth_element / etc on this container. If we consider the search phase and swap phase of C++ STL algorithms, it appears to me that there is a potential for faster sorting if the correct iterator is used, if data is mostly sorted already.

MyList<int> list;
// ...
std::sort (list.begin (), list.end ());

Is there a way to minimize the number of blocks with dirty bit being set if the data is not changed?

Can you provide any pointers for what to look at / follow / learn? Or is this a limitation of the language or compiler?

742

asked Jan 13 '21 21:01

user1456982

2 Answers

The operator is called as per the standard.

To understand why: (with very few exceptions) all (sub)expressions in C++ evaluate exactly the same regardless of the enclosing (full)expression.

This means that the call list[0] is exactly the same in: int a = list[0] and list[0] = 11. So when choosing the overload for list[0] the surrounding expressions are completely ignored. The only things that are taken into account are the type of list and the type of 0. Since list is non-const and a non-const overload with a perfect match is provided, the non-const overload is chosen.

So the problem in your case is that when you call a method on a non-const object there is no way to know if that will further be used to read or write.

Thankfully there are a few solutions to this problem:

Return a proxy

Instead of returning a reference to the element, return a proxy object. Set the dirty bit only when trying to write through that proxy object. This same strategy can be extended to work with iterators.

Provide a const view

Provide something like std::span:

std::span<const T> as_const_view() const;

This will implicitly enable the iterators you want. The downside is that the user needs to be aware of this quirk of your class and avoid calling operator[] for read access.

Don't provide these kinds of overloads

Stick to get and set and avoid confusion.

199

answered Nov 15 '22 04:11

bolov

Block is basically a cached block of data loaded from disk (there can be 2-3 such LRU blocks).

Assuming that you're using an operating system: The operating system already does all this for you. You already pay the cost of doing a variant of LRU replacement, buffer space management, all that - it's done. So you shouldn't be doing any of it yet again - what's the point? If anything, to help with prefetch, you should tell the OS what file location(s) you're anticipating to use next - if you have this information due to how your list is used. Other than that, just map the file to memory and synchronize it to disk forcibly if you think you need that for data integrity reasons. On POSIX/Linux platforms, you'd use msync for that, and there's a Windows equivalent as well.

The operator[] called is always the first one which would set the dirty bit. This is not what I want.

The correct operator is called per C++ spec, so you do in fact "want" it even if you don't know that you do. If you think a bit about what purpose the const method overload resolution serves, you'll realize that doing it any other way would irreparably break the language.

If anything, you shouldn't be setting the dirty bit until something in that block actually changes, i.e. the block should be offering getter and setter methods to change its contents. An alternative, in your existing approach, is to allocate the memory for the block in a individually mapped memory page(s), load the block from disk, then set the pages write-protected. Genuine write access will trigger a signal (on Unix) or a similar mechanism on Windows, and you can then unprotect the page, set its dirty bit, and resume the program.

But that's utterly moot once you memory-map the file these blocks come from: the operating system does all the dirty-bit generation for you!

answered Nov 15 '22 03:11

Kuba hasn't forgotten Monica

Related questions
                            
                                Iterating over container or range - problem with constness
                            
                                constexpr function with unused reference argument – gcc vs clang
                            
                                What is the correct way to compile multiple test sources with Catch2?
                            
                                Unknown type name uint64_t and uint16_t uint8_t in Linux [closed]
                            
                                fastest way to convert two-bit number to low-memory representation
                            
                                A safe, standard-compliant way to make a class template specialization fail to compile using `static_assert` only if it is instantiated?
                            
                                What causes this weird behavior with throwing destructors during unwinding of a try-block?
                            
                                CUDA(GPU) as OpenCV backend
                            
                                Calling FFTW's in-place real-to-complex transform without violating strict aliasing rules
                            
                                How to initialize a matrix ONCE in a constexpr constructor?
                            
                                C++17 split constexpr string on comma and have the number of elements at compile time?
                            
                                Is there any way to cast a std::any containing a derived pointer to a base pointer, without knowing the derived type?
                            
                                Why is there no SIMD functionality in the C++ standard library?
                            
                                Must enable_shared_from_this be the first base class?
                            
                                Non-type pointer and reference template parameters and how/why they are resolved at compile time. c++
                            
                                Forward-declaring template variable in C++
                            
                                Disallowing overriding virtual method returning const reference with a method returning non-const reference
                            
                                error: unable to handle compilation, expected exactly one compiler job in '' [clang-diagnostic-error]
                            
                                Is modifying the internal bytes of a const object undefined behavior in case it contains another object constructed by placement new?
                            
                                Why is default noexcept move constructor being accepted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With