Why do std::string operations perform poorly?

Tags:

I made a test to compare string operations in several languages for choosing a language for the server-side application. The results seemed normal until I finally tried C++, which surprised me a lot. So I wonder if I had missed any optimization and come here for help.

The test are mainly intensive string operations, including concatenate and searching. The test is performed on Ubuntu 11.10 amd64, with GCC's version 4.6.1. The machine is Dell Optiplex 960, with 4G RAM, and Quad-core CPU.

in Python (2.7.2):

def test():     x = ""     limit = 102 * 1024     while len(x) < limit:         x += "X"         if x.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0:             print("Oh my god, this is impossible!")     print("x's length is : %d" % len(x))  test()

which gives result:

x's length is : 104448  real    0m8.799s user    0m8.769s sys     0m0.008s

in Java (OpenJDK-7):

public class test {     public static void main(String[] args) {         int x = 0;         int limit = 102 * 1024;         String s="";         for (; s.length() < limit;) {             s += "X";             if (s.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ") > 0)             System.out.printf("Find!\n");         }         System.out.printf("x's length = %d\n", s.length());     } }

which gives result:

x's length = 104448  real    0m50.436s user    0m50.431s sys     0m0.488s

in Javascript (Nodejs 0.6.3)

function test() {     var x = "";     var limit = 102 * 1024;     while (x.length < limit) {         x += "X";         if (x.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0)             console.log("OK");     }     console.log("x's length = " + x.length); }();

which gives result:

x's length = 104448  real    0m3.115s user    0m3.084s sys     0m0.048s

in C++ (g++ -Ofast)

It's not surprising that Nodejs performas better than Python or Java. But I expected libstdc++ would give much better performance than Nodejs, whose result really suprised me.

#include <iostream> #include <string> using namespace std; void test() {     int x = 0;     int limit = 102 * 1024;     string s("");     for (; s.size() < limit;) {         s += "X";         if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != string::npos)             cout << "Find!" << endl;     }     cout << "x's length = " << s.size() << endl; }  int main() {     test(); }

which gives result:

x length = 104448  real    0m5.905s user    0m5.900s sys     0m0.000s

Brief Summary

OK, now let's see the summary:

javascript on Nodejs(V8): 3.1s
Python on CPython 2.7.2 : 8.8s
C++ with libstdc++: 5.9s
Java on OpenJDK 7: 50.4s

Surprisingly! I tried "-O2, -O3" in C++ but noting helped. C++ seems about only 50% performance of javascript in V8, and even poor than CPython. Could anyone explain to me if I had missed some optimization in GCC or is this just the case? Thank you a lot.

1000

asked Nov 29 '11 11:11

Wu Shu

1 Answers

It's not that std::string performs poorly (as much as I dislike C++), it's that string handling is so heavily optimized for those other languages.

Your comparisons of string performance are misleading, and presumptuous if they are intended to represent more than just that.

I know for a fact that Python string objects are completely implemented in C, and indeed on Python 2.7, numerous optimizations exist due to the lack of separation between unicode strings and bytes. If you ran this test on Python 3.x you will find it considerably slower.

Javascript has numerous heavily optimized implementations. It's to be expected that string handling is excellent here.

Your Java result may be due to improper string handling, or some other poor case. I expect that a Java expert could step in and fix this test with a few changes.

As for your C++ example, I'd expect performance to slightly exceed the Python version. It does the same operations, with less interpreter overhead. This is reflected in your results. Preceding the test with s.reserve(limit); would remove reallocation overhead.

I'll repeat that you're only testing a single facet of the languages' implementations. The results for this test do not reflect the overall language speed.

I've provided a C version to show how silly such pissing contests can be:

#define _GNU_SOURCE #include <string.h> #include <stdio.h>  void test() {     int limit = 102 * 1024;     char s[limit];     size_t size = 0;     while (size < limit) {         s[size++] = 'X';         if (memmem(s, size, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {             fprintf(stderr, "zomg\n");             return;         }     }     printf("x's length = %zu\n", size); }  int main() {     test();     return 0; }

Timing:

matt@stanley:~/Desktop$ time ./smash  x's length = 104448  real    0m0.681s user    0m0.680s sys     0m0.000s

answered Oct 18 '22 04:10

Matt Joiner

Related questions
                            
                                What does "int* p=+s;" do?
                            
                                Complete example using Boost::Signals for C++ Eventing
                            
                                Can a near-zero floating value cause a divide-by-zero error?
                            
                                Does std::vector.clear() do delete (free memory) on each element?
                            
                                vector<int>::size_type in C++
                            
                                Does set_target_properties in CMake override CMAKE_CXX_FLAGS?
                            
                                How to use C++ classes with ctypes?
                            
                                How to make an expandable/collapsable section widget in Qt
                            
                                std::transform() and toupper(), no matching function
                            
                                How to get main window handle from process id?
                            
                                Emulate "double" using 2 "float"s
                            
                                C++ cannot convert from base A to derived type B via virtual base A
                            
                                Writing stringstream contents into ofstream
                            
                                Redefining lambdas not allowed in C++11, why?
                            
                                what does cout << "\n"[a==N]; do?
                            
                                How to convert Euler angles to directional vector?
                            
                                What is a good OO C++ wrapper for sqlite [closed]
                            
                                What does "cv-unqualified" mean in C++?
                            
                                What is the most elegant way to read a text file with c++?
                            
                                Illegal token on right side of ::

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do std::string operations perform poorly?

Tags:

c++

performance

python

node.js

stl

in Python (2.7.2):

in Java (OpenJDK-7):

in Javascript (Nodejs 0.6.3)

in C++ (g++ -Ofast)

Brief Summary

Wu Shu

People also ask

1 Answers

Matt Joiner

Recent Activity

Donate For Us