Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do std::string operations perform poorly?

I made a test to compare string operations in several languages for choosing a language for the server-side application. The results seemed normal until I finally tried C++, which surprised me a lot. So I wonder if I had missed any optimization and come here for help.

The test are mainly intensive string operations, including concatenate and searching. The test is performed on Ubuntu 11.10 amd64, with GCC's version 4.6.1. The machine is Dell Optiplex 960, with 4G RAM, and Quad-core CPU.

in Python (2.7.2):

def test():     x = ""     limit = 102 * 1024     while len(x) < limit:         x += "X"         if x.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0:             print("Oh my god, this is impossible!")     print("x's length is : %d" % len(x))  test() 

which gives result:

x's length is : 104448  real    0m8.799s user    0m8.769s sys     0m0.008s 

in Java (OpenJDK-7):

public class test {     public static void main(String[] args) {         int x = 0;         int limit = 102 * 1024;         String s="";         for (; s.length() < limit;) {             s += "X";             if (s.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ") > 0)             System.out.printf("Find!\n");         }         System.out.printf("x's length = %d\n", s.length());     } } 

which gives result:

x's length = 104448  real    0m50.436s user    0m50.431s sys     0m0.488s 

in Javascript (Nodejs 0.6.3)

function test() {     var x = "";     var limit = 102 * 1024;     while (x.length < limit) {         x += "X";         if (x.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0)             console.log("OK");     }     console.log("x's length = " + x.length); }(); 

which gives result:

x's length = 104448  real    0m3.115s user    0m3.084s sys     0m0.048s 

in C++ (g++ -Ofast)

It's not surprising that Nodejs performas better than Python or Java. But I expected libstdc++ would give much better performance than Nodejs, whose result really suprised me.

#include <iostream> #include <string> using namespace std; void test() {     int x = 0;     int limit = 102 * 1024;     string s("");     for (; s.size() < limit;) {         s += "X";         if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != string::npos)             cout << "Find!" << endl;     }     cout << "x's length = " << s.size() << endl; }  int main() {     test(); } 

which gives result:

x length = 104448  real    0m5.905s user    0m5.900s sys     0m0.000s 

Brief Summary

OK, now let's see the summary:

  • javascript on Nodejs(V8): 3.1s
  • Python on CPython 2.7.2 : 8.8s
  • C++ with libstdc++: 5.9s
  • Java on OpenJDK 7: 50.4s

Surprisingly! I tried "-O2, -O3" in C++ but noting helped. C++ seems about only 50% performance of javascript in V8, and even poor than CPython. Could anyone explain to me if I had missed some optimization in GCC or is this just the case? Thank you a lot.

like image 1000
Wu Shu Avatar asked Nov 29 '11 11:11

Wu Shu


People also ask

Is std::string good?

In general you should always use std::string, since it is less bug prone. Be aware, that memory overhead of std::string is significant. Recently I've performed some experiments about std::string overhead. In general it is about 48 bytes!

Are C strings faster than std::string?

C-strings are usually faster, because they do not call malloc/new. But there are cases where std::string is faster. Function strlen() is O(N), but std::string::size() is O(1). Also when you search for substring, in C strings you need to check for '\0' on every cycle, in std::string - you don't.

Why do we use std::string?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.

What is the difference between a C++ string and an std::string?

There is no functionality difference between string and std::string because they're the same type.


1 Answers

It's not that std::string performs poorly (as much as I dislike C++), it's that string handling is so heavily optimized for those other languages.

Your comparisons of string performance are misleading, and presumptuous if they are intended to represent more than just that.

I know for a fact that Python string objects are completely implemented in C, and indeed on Python 2.7, numerous optimizations exist due to the lack of separation between unicode strings and bytes. If you ran this test on Python 3.x you will find it considerably slower.

Javascript has numerous heavily optimized implementations. It's to be expected that string handling is excellent here.

Your Java result may be due to improper string handling, or some other poor case. I expect that a Java expert could step in and fix this test with a few changes.

As for your C++ example, I'd expect performance to slightly exceed the Python version. It does the same operations, with less interpreter overhead. This is reflected in your results. Preceding the test with s.reserve(limit); would remove reallocation overhead.

I'll repeat that you're only testing a single facet of the languages' implementations. The results for this test do not reflect the overall language speed.

I've provided a C version to show how silly such pissing contests can be:

#define _GNU_SOURCE #include <string.h> #include <stdio.h>  void test() {     int limit = 102 * 1024;     char s[limit];     size_t size = 0;     while (size < limit) {         s[size++] = 'X';         if (memmem(s, size, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {             fprintf(stderr, "zomg\n");             return;         }     }     printf("x's length = %zu\n", size); }  int main() {     test();     return 0; } 

Timing:

matt@stanley:~/Desktop$ time ./smash  x's length = 104448  real    0m0.681s user    0m0.680s sys     0m0.000s 
like image 68
Matt Joiner Avatar answered Oct 18 '22 04:10

Matt Joiner