I'm currently learning C++ on my own, and I am curious about how push_back()
and emplace_back()
work under the hood. I've always assumed that emplace_back()
is faster when you are trying to construct and push a large object to the back of a container, like a vector.
Let's suppose I have a Student
object that I want to append to the back of a vector of Students.
struct Student {
string name;
int student_ID;
double GPA;
string favorite_food;
string favorite_prof;
int hours_slept;
int birthyear;
Student(string name_in, int ID_in, double GPA_in, string food_in,
string prof_in, int sleep_in, int birthyear_in) :
/* initialize member variables */ { }
};
Suppose I call push_back()
and push a Student
object to the end of a vector:
vector<Student> vec;
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
My understanding here is that push_back
creates an instance of the Student
object outside of the vector and then moves it to the back of the vector.
Diagram:
I can also emplace instead of push:
vector<Student> vec;
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
My understanding here is that the Student object is constructed at the very back of the vector so that no moving is required.
Diagram:
Thus, it would make sense that emplacing would be faster, especially if many Student objects are added. However, when I timed these two versions of code:
for (int i = 0; i < 10000000; ++i) {
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
}
and
for (int i = 0; i < 10000000; ++i) {
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
}
I expected the latter to be faster, since the large Student object wouldn't have to be moved. Oddly enough, the emplace_back
version ended up being slower (across multiple attempts). I also tried inserting 10000000 Student objects, where the constructor takes in references and the arguments in push_back()
and emplace_back()
are stored in variables. This also didn't work, as emplace was still slower.
I've checked to make sure that I'm inserting the same number of objects in both cases. The time difference isn't too large, but emplacing ended up slower by a few seconds.
Is there something wrong with my understanding of how push_back()
and emplace_back()
work? Thank you very much for your time!
Here's the code, as requested. I'm using the g++ compiler.
Push back:
struct Student {
string name;
int student_ID;
double GPA;
string favorite_food;
string favorite_prof;
int hours_slept;
int birthyear;
Student(string name_in, int ID_in, double GPA_in, string food_in,
string prof_in, int sleep_in, int birthyear_in) :
name(name_in), student_ID(ID_in), GPA(GPA_in),
favorite_food(food_in), favorite_prof(prof_in),
hours_slept(sleep_in), birthyear(birthyear_in) {}
};
int main() {
vector<Student> vec;
vec.reserve(10000000);
for (int i = 0; i < 10000000; ++i)
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
return 0;
}
Emplace back:
struct Student {
string name;
int student_ID;
double GPA;
string favorite_food;
string favorite_prof;
int hours_slept;
int birthyear;
Student(string name_in, int ID_in, double GPA_in, string food_in,
string prof_in, int sleep_in, int birthyear_in) :
name(name_in), student_ID(ID_in), GPA(GPA_in),
favorite_food(food_in), favorite_prof(prof_in),
hours_slept(sleep_in), birthyear(birthyear_in) {}
};
int main() {
vector<Student> vec;
vec.reserve(10000000);
for (int i = 0; i < 10000000; ++i)
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
return 0;
}
push_back("foo") constructs a temporary string from the string literal, and then moves that string into the container, whereas my_vec. emplace_back("foo") just constructs the string directly in the container, avoiding the extra move.
because emplace_back would construct the object immediately in the vector, while push_back , would first construct an anonymous object and then would copy it to the vector.
When inserted using the push_back, the new element is copy-or-move-constructed. The insertion could be inefficient if the passed argument to the push_back is a temporary because the temporary is constructed, copied/moved, and destroyed.
This code demonstrates that emplace_back calls the copy constructor of A for some reason to copy the first element. But if you leave copy constructor as deleted, it will use move constructor instead.
This behavior is due to the complexity of std::string
. There are a couple things interacting here:
push_back
version, the compiler is able to determine the length of the string at compile-time, whereas the compiler was unable to do so for the emplace_back
version. Thus, the emplace_back
call requires calls to strlen
. Furthermore, since the compiler doesn't know the length of the string literal, it has to emit code for both the SSO and non-SSO cases (see Jason Turner's "Initializer Lists Are Broken, Let's Fix Them"; it's a long talk, but he follows the problem of inserting strings into a vector throughout it)Consider this simpler type:
struct type {
std::string a;
std::string b;
std::string c;
type(std::string a, std::string b, std::string c)
: a{a}
, b{b}
, c{c}
{}
};
Note how the constructor copies a
, b
, and c
.
Testing this against a baseline of just allocating memory, we can see that push_back
outperforms emplace_back
:
Click on image for quick-bench link
Because the strings in your example all fit inside the SSO buffer, copying is just as cheap as moving in this case. Thus, the constructor is perfectly efficient, and the improvements from emplace_back
have a smaller effect.
Also, if we search the assembly for both a call to push_back
and a call to emplace_back
:
// push_back call
void foo(std::vector<type>& vec) {
vec.push_back({"Bob", "pizza", "Smith"});
}
// emplace_back call
void foo(std::vector<type>& vec) {
vec.emplace_back("Bob", "pizza", "Smith");
}
(Assembly not copied here. It's massive. std::string
is complicated)
We can see that emplace_back
has calls to strlen
, whereas push_back
does not. Since the distance between the string literal and the std::string
being constructed is increased, the compiler was unable to optimize out the call to strlen
.
Explicitly calling the std::string
constructor would remove the calls to strlen
, but would no longer construct them in place, so that doesn't work to speed up emplace_back
.
All this said, if we leave the SSO by using long enough strings, the allocation cost completely drowns out these details, so both emplace_back
and push_back
have the same performance:
Click on image for quick-bench link
If you fix the constructor of type
to move its arguments, emplace_back
becomes faster in all cases.
struct type {
std::string a;
std::string b;
std::string c;
type(std::string a, std::string b, std::string c)
: a{std::move(a)}
, b{std::move(b)}
, c{std::move(c)}
{}
};
SSO case
Click on image for quick-bench link
Long case
Click on image for quick-bench link
However, the SSO push_back
case slowed down; the compiler seems to emit extra copies.
The optimal version of perfect forwarding does not suffer from this drawback (note the scale change on the vertical axis):
struct type {
std::string a;
std::string b;
std::string c;
template <typename A, typename B, typename C>
type(A&& a, B&& b, C&& c)
: a{std::forward<A>(a)}
, b{std::forward<B>(b)}
, c{std::forward<C>(c)}
{}
};
Click on image for quick-bench link
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With