I made a test to compare string operations in several languages for choosing a language for the server-side application. The results seemed normal until I finally tried C++, which surprised me a lot. So I wonder if I had missed any optimization and come here for help.
The test are mainly intensive string operations, including concatenate and searching. The test is performed on Ubuntu 11.10 amd64, with GCC's version 4.6.1. The machine is Dell Optiplex 960, with 4G RAM, and Quad-core CPU.
def test(): x = "" limit = 102 * 1024 while len(x) < limit: x += "X" if x.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0: print("Oh my god, this is impossible!") print("x's length is : %d" % len(x)) test()
which gives result:
x's length is : 104448 real 0m8.799s user 0m8.769s sys 0m0.008s
public class test { public static void main(String[] args) { int x = 0; int limit = 102 * 1024; String s=""; for (; s.length() < limit;) { s += "X"; if (s.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ") > 0) System.out.printf("Find!\n"); } System.out.printf("x's length = %d\n", s.length()); } }
which gives result:
x's length = 104448 real 0m50.436s user 0m50.431s sys 0m0.488s
function test() { var x = ""; var limit = 102 * 1024; while (x.length < limit) { x += "X"; if (x.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0) console.log("OK"); } console.log("x's length = " + x.length); }();
which gives result:
x's length = 104448 real 0m3.115s user 0m3.084s sys 0m0.048s
It's not surprising that Nodejs performas better than Python or Java. But I expected libstdc++ would give much better performance than Nodejs, whose result really suprised me.
#include <iostream> #include <string> using namespace std; void test() { int x = 0; int limit = 102 * 1024; string s(""); for (; s.size() < limit;) { s += "X"; if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != string::npos) cout << "Find!" << endl; } cout << "x's length = " << s.size() << endl; } int main() { test(); }
which gives result:
x length = 104448 real 0m5.905s user 0m5.900s sys 0m0.000s
OK, now let's see the summary:
Surprisingly! I tried "-O2, -O3" in C++ but noting helped. C++ seems about only 50% performance of javascript in V8, and even poor than CPython. Could anyone explain to me if I had missed some optimization in GCC or is this just the case? Thank you a lot.
In general you should always use std::string, since it is less bug prone. Be aware, that memory overhead of std::string is significant. Recently I've performed some experiments about std::string overhead. In general it is about 48 bytes!
C-strings are usually faster, because they do not call malloc/new. But there are cases where std::string is faster. Function strlen() is O(N), but std::string::size() is O(1). Also when you search for substring, in C strings you need to check for '\0' on every cycle, in std::string - you don't.
std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.
There is no functionality difference between string and std::string because they're the same type.
It's not that std::string
performs poorly (as much as I dislike C++), it's that string handling is so heavily optimized for those other languages.
Your comparisons of string performance are misleading, and presumptuous if they are intended to represent more than just that.
I know for a fact that Python string objects are completely implemented in C, and indeed on Python 2.7, numerous optimizations exist due to the lack of separation between unicode strings and bytes. If you ran this test on Python 3.x you will find it considerably slower.
Javascript has numerous heavily optimized implementations. It's to be expected that string handling is excellent here.
Your Java result may be due to improper string handling, or some other poor case. I expect that a Java expert could step in and fix this test with a few changes.
As for your C++ example, I'd expect performance to slightly exceed the Python version. It does the same operations, with less interpreter overhead. This is reflected in your results. Preceding the test with s.reserve(limit);
would remove reallocation overhead.
I'll repeat that you're only testing a single facet of the languages' implementations. The results for this test do not reflect the overall language speed.
I've provided a C version to show how silly such pissing contests can be:
#define _GNU_SOURCE #include <string.h> #include <stdio.h> void test() { int limit = 102 * 1024; char s[limit]; size_t size = 0; while (size < limit) { s[size++] = 'X'; if (memmem(s, size, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) { fprintf(stderr, "zomg\n"); return; } } printf("x's length = %zu\n", size); } int main() { test(); return 0; }
Timing:
matt@stanley:~/Desktop$ time ./smash x's length = 104448 real 0m0.681s user 0m0.680s sys 0m0.000s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With