I profiled my program, and found that changing from standard allocator to a custom one-frame allocator can remove my biggest bottleneck.
Here is a dummy snippet (coliru link):-
class Allocator{ //can be stack/heap/one-frame allocator
//some complex field and algorithm
//e.g. virtual void* allocate(int amountByte,int align)=0;
//e.g. virtual void deallocate(void* v)=0;
};
template<class T> class MyArray{
//some complex field
Allocator* allo=nullptr;
public: MyArray( Allocator* a){
setAllocator(a);
}
public: void setAllocator( Allocator* a){
allo=a;
}
public: void add(const T& t){
//store "t" in some array
}
//... other functions
};
However, my one-frame allocator has a drawback - user must be sure that every objects allocated by one-frame allocator must be deleted/released at the end of time-step.
Here is an example of use-case.
I use the one-frame allocator to store temporary result of M3
(overlapping surface from collision detection; wiki link) in Physics Engine.
Here is a snippet.M1
,M2
and M3
are all manifolds, but in different level of detail :-
Allocator oneFrameAllocator;
Allocator heapAllocator;
class M1{}; //e.g. a single-point collision site
class M2{ //e.g. analysed many-point collision site
public: MyArray<M1> m1s{&oneFrameAllocator};
};
class M3{ //e.g. analysed collision surface
public: MyArray<M2> m2s{&oneFrameAllocator};
};
Notice that I set default allocator to be oneFrameAllocator
(because it is CPU-saver).
Because I create instance of M1
,M2
and M3
only as temporary variables, it works.
Now, I want to cache a new instance of M3 outout_m3=m3;
for the next timeStep
.
(^ To check whether a collision is just start or just end)
In other words, I want to copy one-frame allocated m3
to heap allocated output_m3
at #3
(shown below).
Here is the game-loop :-
int main(){
M3 output_m3; //must use "heapAllocator"
for(int timeStep=0;timeStep<100;timeStep++){
//v start complex computation #2
M3 m3;
M2 m2;
M1 m1;
m2.m1s.add(m1);
m3.m2s.add(m2);
//^ end complex computation
//output_m3=m3; (change allocator, how? #3)
//.... clean up oneFrameAllocator here ....
}
}
I can't assign output_m3=m3
directly, because output_m3
will copy usage of one-frame allocator from m3
.
My poor solution is to create output_m3
from bottom up.
The below code works, but very tedious.
M3 reconstructM3(M3& src,Allocator* allo){
//very ugly here #1
M3 m3New;
m3New.m2s.setAllocator(allo);
for(int n=0;n<src.m2s.size();n++){
M2 m2New;
m2New.m1s.setAllocator(allo);
for(int k=0;k<src.m2s[n].m1s.size();k++){
m2New.m1s.add(src.m2s[n].m1s[k]);
}
m3New.m2s.add(m2New);
}
return m3New;
}
output_m3=reconstructM3(m3,&heapAllocator);
How to switch allocator of an object elegantly (without propagating everything by hand)?
MyArray<T,StackAllocator>
) is undesirable. Allocator::allocate()
and Allocator::deallocate()
. operator=()
like MSalters advised, but I can't find a proper way to achieve it. Reference: After receiving an answer from JaMiT, I found that this question is similar to Using custom allocator for AllocatorAwareContainer data members of a class .
At its core, this question is asking for a way to use a custom allocator with a multi-level container. There are other stipulations, but after thinking about this, I've decided to ignore some of those stipulations. They seem to be getting in the way of solutions without a good reason. That leaves open the possibility of an answer from the standard library: std::scoped_allocator_adaptor
and std::vector
.
Perhaps the biggest change with this approach is tossing the idea that a container's allocator needs to be modifiable after construction (toss the setAllocator
member). That idea seems questionable in general and incorrect in this specific case. Look at the criteria for deciding which allocator to use:
timeStep
.That is, you can tell which allocation strategy to use by looking at the scope of the object/variable in question. (Is it inside or outside the loop body?) Scope is known at construction time and does not change (as long as you don't abuse std::move
). So the desired allocator is known at construction time and does not change. However, the current constructors do not permit specifying an allocator. That is something to change. Fortunately, such a change is a fairly natural extension of introducing scoped_allocator_adaptor
.
The other big change is tossing the MyArray
class. Standard containers exist to make your programming easier. Compared to writing your own version, the standard containers are faster to implement (as in, already done) and less prone to error (the standard strives for a higher bar of quality than "works for me this time"). So out with the MyArray
template and in with std::vector
.
The code snippets in this section can be joined into a single source file that compiles. Just skip over my commentary between them. (This is why only the first snippet includes headers.)
Your current Allocator
class is a reasonable starting point. It just needs a pair of methods that indicate when two instances are interchangeable (i.e. when both are able to deallocate memory that was allocated by either of them). I also took the liberty of changing amountByte
to an unsigned type, since allocating a negative amount of memory does not make sense. (I left the type of align
alone though, since there is no indication of what values this would take. Possibly it should be unsigned
or an enumeration.)
#include <cstdlib>
#include <functional>
#include <scoped_allocator>
#include <vector>
class Allocator {
public:
virtual void * allocate(std::size_t amountByte, int align)=0;
virtual void deallocate(void * v)=0;
//some complex field and algorithm
// **** Addition ****
// Two objects are considered equal when they are interchangeable at deallocation time.
// There might be a more refined way to define this relation, but without the internals
// of Allocator, I'll go with simply being the same object.
bool operator== (const Allocator & other) const { return this == &other; }
bool operator!= (const Allocator & other) const { return this != &other; }
};
Next up are the two specializations. Their details are outside the scope of the question, though. So I'll just mock up something that will compile (needed since one cannot directly instantiate an abstract base class).
// Mock-up to allow defining the two allocators.
class DerivedAllocator : public Allocator {
public:
void * allocate(std::size_t amountByte, int) override { return std::malloc(amountByte); }
void deallocate(void * v) override { std::free(v); }
};
DerivedAllocator oneFrameAllocator;
DerivedAllocator heapAllocator;
Now we get into the first meaty chunk – adapting Allocator
to the standard's expectations. This consists of a wrapper template whose parameter is the type of object being constructed. If you can parse the Allocator requirements, this step is simple. Admitedly, parsing the requirements is not simple since they are designed to cover "fancy pointers".
// Standard interface for the allocator
template <class T>
struct AllocatorOf {
// Some basic definitions:
//Allocator & alloc; // A plain reference is an option if you don't support swapping.
std::reference_wrapper<Allocator> alloc; // Or a pointer if you want to add null checks.
AllocatorOf(Allocator & a) : alloc(a) {} // Note: Implicit conversion allowed
// Maybe this value would come from a helper template? Tough to say, but as long as
// the value depends solely on T, the value can be a static class constant.
static constexpr int ALIGN = 0;
// The things required by the Allocator requirements:
using value_type = T;
// Rebind from other types:
template <class U>
AllocatorOf(const AllocatorOf<U> & other) : alloc(other.alloc) {}
// Pass through to Allocator:
T * allocate (std::size_t n) { return static_cast<T *>(alloc.get().allocate(n * sizeof(T), ALIGN)); }
void deallocate(T * ptr, std::size_t) { alloc.get().deallocate(ptr); }
// Support swapping (helps ease writing a constructor)
using propagate_on_container_swap = std::true_type;
};
// Also need the interchangeability test at this level.
template<class T, class U>
bool operator== (const AllocatorOf<T> & a_t, const AllocatorOf<U> & a_u)
{ return a_t.get().alloc == a_u.get().alloc; }
template<class T, class U>
bool operator!= (const AllocatorOf<T> & a_t, const AllocatorOf<U> & a_u)
{ return a_t.get().alloc != a_u.get().alloc; }
Next up are the manifold classes. The lowest level (M1) does not need any changes.
The mid-levels (M2) need two additions to get the desired results.
allocator_type
needs to be defined. Its existence indicates that the class is allocator-aware.scoped_allocator
works by automatically appending the allocator to the provided construction parameters. Since the sample code makes copies inside the vectors, a "copy-plus-allocator" constructor is needed.)
In addition, for general use, the mid-levels should get a constructor whose lone parameter is an allocator. For readability, I'll also bring back the MyArray
name (but not the template).
The highest level (M3) just needs the constructor taking an allocator. Still, the two type aliases are useful for readability and consistency, so I'll throw them in as well.
class M1{}; //e.g. a single-point collision site
class M2{ //e.g. analysed many-point collision site
public:
using allocator_type = std::scoped_allocator_adaptor<AllocatorOf<M1>>;
using MyArray = std::vector<M1, allocator_type>;
// Default construction still uses oneFrameAllocator, but this can be overridden.
explicit M2(const allocator_type & alloc = oneFrameAllocator) : m1s(alloc) {}
// "Copy" constructor used via scoped_allocator_adaptor
//M2(const M2 & other, const allocator_type & alloc) : m1s(other.m1s, alloc) {}
// You may want to instead delegate to the true copy constructor. This means that
// the m1s array will be copied twice (unless the compiler is able to optimize
// away the first copy). So this would need to be performance tested.
M2(const M2 & other, const allocator_type & alloc) : M2(other)
{
MyArray realloc{other.m1s, alloc};
m1s.swap(realloc); // This is where we need swap support.
}
MyArray m1s;
};
class M3{ //e.g. analysed collision surface
public:
using allocator_type = std::scoped_allocator_adaptor<AllocatorOf<M2>>;
using MyArray = std::vector<M2, allocator_type>;
// Default construction still uses oneFrameAllocator, but this can be overridden.
explicit M3(const allocator_type & alloc = oneFrameAllocator) : m2s(alloc) {}
MyArray m2s;
};
Let's see... two lines added to Allocator
(could be reduced to just one), four-ish to M2
, three to M3
, eliminate the MyArray
template, and add the AllocatorOf
template. That's not a huge difference. Well, a little more than that count if you want to leverage the auto-generated copy constructor for M2
(but with the benefit of fully supporting the swapping of vectors). Overall, not that drastic a change.
Here is how the code would be used:
int main()
{
M3 output_m3{heapAllocator};
for ( int timeStep = 0; timeStep < 100; timeStep++ ) {
//v start complex computation #2
M3 m3;
M2 m2;
M1 m1;
m2.m1s.push_back(m1); // <-- vector uses push_back() instead of add()
m3.m2s.push_back(m2); // <-- vector uses push_back() instead of add()
//^ end complex computation
output_m3 = m3; // change to heap allocation
//.... clean up oneFrameAllocator here ....
}
}
The assignment seen here preserves the allocation strategy of output_m3
because AllocatorOf
does not say to do otherwise. This seems to be what should be the desired behavior, not the old way of copying the allocation strategy. Note that if both sides of an assignment already use the same allocation strategy, it doesn't matter if the strategy is preserved or copied. Hence, existing behavior should be preserved with no need for further changes.
Aside from specifying that one variable uses heap allocation, use of the classes is no messier than it was before. Since it was assumed that at some point there would be a need to specify heap allocation, I don't see why this would be objectionable. Use the standard library – it's there to help.
Since you're aiming at performance, I imply that your classes would not manage the lifetime of allocator itself, and would simply use it's raw pointer. Also, since you're changing storage, copying is inevitable. In this case, all you need is to add a "parametrized copy constructor" to each class, e.g.:
template <typename T> class MyArray {
private:
Allocator& _allocator;
public:
MyArray(Allocator& allocator) : _allocator(allocator) { }
MyArray(MyArray& other, Allocator& allocator) : MyArray(allocator) {
// copy items from "other", passing new allocator to their parametrized copy constructors
}
};
class M1 {
public:
M1(Allocator& allocator) { }
M1(const M1& other, Allocator& allocator) { }
};
class M2 {
public:
MyArray<M1> m1s;
public:
M2(Allocator& allocator) : m1s(allocator) { }
M2(const M2& other, Allocator& allocator) : m1s(other.m1s, allocator) { }
};
This way you can simply do:
M3 stackM3(stackAllocator);
// do processing
M3 heapM3(stackM3, heapAllocator); // or return M3(stackM3, heapAllocator);
to create other-allocator-based copy.
Also, depeding on your actual code structure, you can add some template magic to automate things:
template <typename T> class MX {
public:
MyArray<T> ms;
public:
MX(Allocator& allocator) : ms(allocator) { }
MX(const MX& other, Allocator& allocator) : ms(other.ms, allocator) { }
}
class M2 : public MX<M1> {
public:
using MX<M1>::MX; // inherit constructors
};
class M3 : public MX<M2> {
public:
using MX<M2>::MX; // inherit constructors
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With