Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ performance challenge: integer to std::string conversion

People also ask

Can we use stoi in C?

What Is stoi() in C++? In C++, the stoi() function converts a string to an integer value. The function is shorthand for “string to integer,” and C++ programmers use it to parse integers out of strings. The stoi() function is relatively new, as it was only added to the language as of its latest revision (C++11) in 2011.

Which conversion function converts a string into a number?

Convert String To Numeric Types In C++ In general, there are two common methods to convert string to numbers in C++. Using stoi and atoi functions that replicate for all the numeric data types.


#include <string>

const char digit_pairs[201] = {
  "00010203040506070809"
  "10111213141516171819"
  "20212223242526272829"
  "30313233343536373839"
  "40414243444546474849"
  "50515253545556575859"
  "60616263646566676869"
  "70717273747576777879"
  "80818283848586878889"
  "90919293949596979899"
};


std::string& itostr(int n, std::string& s)
{
    if(n==0)
    {
        s="0";
        return s;
    }

    int sign = -(n<0);
    unsigned int val = (n^sign)-sign;

    int size;
    if(val>=10000)
    {
        if(val>=10000000)
        {
            if(val>=1000000000)
                size=10;
            else if(val>=100000000)
                size=9;
            else 
                size=8;
        }
        else
        {
            if(val>=1000000)
                size=7;
            else if(val>=100000)
                size=6;
            else
                size=5;
        }
    }
    else 
    {
        if(val>=100)
        {
            if(val>=1000)
                size=4;
            else
                size=3;
        }
        else
        {
            if(val>=10)
                size=2;
            else
                size=1;
        }
    }
    size -= sign;
    s.resize(size);
    char* c = &s[0];
    if(sign)
        *c='-';

    c += size-1;
    while(val>=100)
    {
       int pos = val % 100;
       val /= 100;
       *(short*)(c-1)=*(short*)(digit_pairs+2*pos); 
       c-=2;
    }
    while(val>0)
    {
        *c--='0' + (val % 10);
        val /= 10;
    }
    return s;
}

std::string& itostr(unsigned val, std::string& s)
{
    if(val==0)
    {
        s="0";
        return s;
    }

    int size;
    if(val>=10000)
    {
        if(val>=10000000)
        {
            if(val>=1000000000)
                size=10;
            else if(val>=100000000)
                size=9;
            else 
                size=8;
        }
        else
        {
            if(val>=1000000)
                size=7;
            else if(val>=100000)
                size=6;
            else
                size=5;
        }
    }
    else 
    {
        if(val>=100)
        {
            if(val>=1000)
                size=4;
            else
                size=3;
        }
        else
        {
            if(val>=10)
                size=2;
            else
                size=1;
        }
    }

    s.resize(size);
    char* c = &s[size-1];
    while(val>=100)
    {
       int pos = val % 100;
       val /= 100;
       *(short*)(c-1)=*(short*)(digit_pairs+2*pos); 
       c-=2;
    }
    while(val>0)
    {
        *c--='0' + (val % 10);
        val /= 10;
    }
    return s;
}

This will blow up on systems that disallow unaligned memory accesses (in which case, the first unaligned assignment via *(short*) would cause a segfault), but should work very nicely otherwise.

One important thing to do is to minimize the use of std::string. (Ironic, I know.) In Visual Studio, for example, most calls to methods of std::string are not inlined, even if you specify /Ob2 in compiler options. So even something as trivial as a call to std::string::clear(), which you might expect to be very fast, can take 100 clockticks when linking CRT as a static library, and as much as 300 clockticks when linking as a DLL.

For the same reason, returning by reference is better because it avoids an assignment, a constructor and a destructor.


Ah, awesome challenge by the way... I've had a lot of fun with this.

I have two algorithms to submit (code is at the bottom if you feel like skipping to it). In my comparisons I require that the function return a string and that it can handle int and unsigned int. Comparing things that don't construct a string to those that do doesn't really make sense.

The first one is a fun implementation that doesn't use any precomputed lookup tables or explicit division/modulo. This one is competitive with the others with gcc and with all but Timo's on msvc (for a good reason that I explain below). The second algorithm is my actual submission for highest performance. In my tests it beats all the others on both gcc and msvc.

I think I know why some of the results on MSVC are very good. std::string has two relevant constructors std::string(char* str, size_t n)
and
std::string(ForwardIterator b, ForwardIterator e)
gcc does the same thing for both of them... that is it uses the second to implement the first. The first constructor can be implemented significantly more efficiently than that and MSVC does so. The side benefit of this is that in some cases (like my fast code and Timo's code) the string constructor can be inlined. In fact, just switching between these constructors in MSVC is almost a 2x difference for my code.

My performance testing results:

Code Sources:

- Voigt
- Timo
- ergosys
- user434507
- user-voigt-timo
- hopman-fun
- hopman-fast

gcc 4.4.5 -O2 on Ubuntu 10.10 64-bit, Core i5

hopman_fun: 124.688  MB/sec --- 8.020 s
hopman_fast: 137.552  MB/sec --- 7.270 s
voigt: 120.192  MB/sec --- 8.320 s
user_voigt_timo: 97.9432  MB/sec --- 10.210 s
timo: 120.482  MB/sec --- 8.300 s
user: 97.7517  MB/sec --- 10.230 s
ergosys: 101.42  MB/sec --- 9.860 s

MSVC 2010 64-bit /Ox on Windows 7 64-bit, Core i5

hopman_fun: 127  MB/sec --- 7.874 s
hopman_fast: 259  MB/sec --- 3.861 s
voigt: 221.435  MB/sec --- 4.516 s
user_voigt_timo: 195.695  MB/sec --- 5.110 s
timo: 253.165  MB/sec --- 3.950 s
user: 212.63  MB/sec --- 4.703 s
ergosys: 78.0518  MB/sec --- 12.812 s

Here are some results and a testing/timing framework on ideone
http://ideone.com/XZRqp
Note that ideone is a 32-bit environment. Both of my algorithms suffer from that, but hopman_fast is at least still competetive.

Note that for those the two or so that don't construct a string I added the following function template:

template <typename T>
std::string itostr(T t) {
    std::string ret;
    itostr(t, ret);
    return ret;
}

Now for my code...first the fun one:

    // hopman_fun

template <typename T> 
T reduce2(T v) {
    T k = ((v * 410) >> 12) & 0x000F000F000F000Full;
    return (((v - k * 10) << 8) + k);
}

template <typename T>
T reduce4(T v) {
    T k = ((v * 10486) >> 20) & 0xFF000000FFull;
    return reduce2(((v - k * 100) << 16) + (k));
}

typedef unsigned long long ull;
inline ull reduce8(ull v) {
    ull k = ((v * 3518437209u) >> 45);
    return reduce4(((v - k * 10000) << 32) + (k));
}

template <typename T>
std::string itostr(T o) {
    union {
        char str[16];
        unsigned short u2[8];
        unsigned u4[4];
        unsigned long long u8[2];
    };

    unsigned v = o < 0 ? ~o + 1 : o;

    u8[0] = (ull(v) * 3518437209u) >> 45;
    u8[0] = (u8[0] * 28147497672ull);
    u8[1] = v - u2[3] * 100000000;

    u8[1] = reduce8(u8[1]);
    char* f;
    if (u2[3]) {
        u2[3] = reduce2(u2[3]);
        f = str + 6;
    } else {
        unsigned short* k = u4[2] ? u2 + 4 : u2 + 6;
        f = *k ? (char*)k : (char*)(k + 1);
    }
    if (!*f) f++;

    u4[1] |= 0x30303030;
    u4[2] |= 0x30303030;
    u4[3] |= 0x30303030;
    if (o < 0) *--f = '-';
    return std::string(f, (str + 16) - f);
}

And then the fast one:

    // hopman_fast

struct itostr_helper {
    static unsigned out[10000];

    itostr_helper() {
        for (int i = 0; i < 10000; i++) {
            unsigned v = i;
            char * o = (char*)(out + i);
            o[3] = v % 10 + '0';
            o[2] = (v % 100) / 10 + '0';
            o[1] = (v % 1000) / 100 + '0';
            o[0] = (v % 10000) / 1000;
            if (o[0]) o[0] |= 0x30;
            else if (o[1] != '0') o[0] |= 0x20;
            else if (o[2] != '0') o[0] |= 0x10;
            else o[0] |= 0x00;
        }
    }
};
unsigned itostr_helper::out[10000];

itostr_helper hlp_init;

template <typename T>
std::string itostr(T o) {
    typedef itostr_helper hlp;

    unsigned blocks[3], *b = blocks + 2;
    blocks[0] = o < 0 ? ~o + 1 : o;
    blocks[2] = blocks[0] % 10000; blocks[0] /= 10000;
    blocks[2] = hlp::out[blocks[2]];

    if (blocks[0]) {
        blocks[1] = blocks[0] % 10000; blocks[0] /= 10000;
        blocks[1] = hlp::out[blocks[1]];
        blocks[2] |= 0x30303030;
        b--;
    }

    if (blocks[0]) {
        blocks[0] = hlp::out[blocks[0] % 10000];
        blocks[1] |= 0x30303030;
        b--;
    }

    char* f = ((char*)b);
    f += 3 - (*f >> 4);

    char* str = (char*)blocks;
    if (o < 0) *--f = '-';
    return std::string(f, (str + 12) - f);
}

Benchmark data for the code provided in the question:

On ideone (gcc 4.3.4):

  • stringstreams: 4.4 MB/s
  • sprintf: 25.0 MB/s
  • mine (Ben Voigt): 55.8 MB/s
  • Timo: 58.5 MB/s
  • user434507: 199 MB/s
  • user434507's Ben-Timo-507 hybrid: 263 MB/s

Core i7, Windows 7 64-bit, 8 GB RAM, Visual C++ 2010 32-bit:

cl /Ox /EHsc

  • stringstreams: 3.39 MB/s, 3.67 MB/s
  • sprintf: 16.8 MB/s, 16.2 MB/s
  • mine: 194 MB/s, 207 MB/s (with PGO enabled: 250 MB/s)

Core i7, Windows 7 64-bit, 8 GB RAM, Visual C++ 2010 64-bit:

cl /Ox /EHsc

  • stringstreams: 4.42 MB/s, 4.92 MB/s
  • sprintf: 21.0 MB/s, 20.8 MB/s
  • mine: 238 MB/s, 228 MB/s

Core i7, Windows 7 64-bit, 8 GB RAM, cygwin gcc 4.3.4:

g++ -O3

  • stringstreams: 2.19 MB/s, 2.17 MB/s
  • sprintf: 13.1 MB/s, 13.4 MB/s
  • mine: 30.0 MB/s, 30.2 MB/s

edit: I was gonna add my own answer, but the question was was closed so I'm adding it here. :) I wrote my own algorithm and managed to get a decent improvement over Ben's code, though I only tested it in MSVC 2010. I also made a benchmark of all the implementations presented so far, using the same testing setup that was in Ben's original code. -- Timo

Intel Q9450, Win XP 32bit, MSVC 2010

cl /O2 /EHsc

  • stringstream: 2.87 MB/s
  • sprintf: 16.1 MB/s
  • Ben: 202 MB/s
  • Ben (unsigned buffer): 82.0 MB/s
  • ergosys (updated version): 64.2 MB/s
  • user434507: 172 MB/s
  • Timo: 241 MB/s

-

const char digit_pairs[201] = {
  "00010203040506070809"
  "10111213141516171819"
  "20212223242526272829"
  "30313233343536373839"
  "40414243444546474849"
  "50515253545556575859"
  "60616263646566676869"
  "70717273747576777879"
  "80818283848586878889"
  "90919293949596979899"
};

static const int BUFFER_SIZE = 11;

std::string itostr(int val)
{
  char buf[BUFFER_SIZE];
  char *it = &buf[BUFFER_SIZE-2];

  if(val>=0) {
    int div = val/100;
    while(div) {
      memcpy(it,&digit_pairs[2*(val-div*100)],2);
      val = div;
      it-=2;
      div = val/100;
    }
    memcpy(it,&digit_pairs[2*val],2);
    if(val<10)
      it++;
  } else {
    int div = val/100;
    while(div) {
      memcpy(it,&digit_pairs[-2*(val-div*100)],2);
      val = div;
      it-=2;
      div = val/100;
    }
    memcpy(it,&digit_pairs[-2*val],2);
    if(val<=-10)
      it--;
    *it = '-';
  }

  return std::string(it,&buf[BUFFER_SIZE]-it);
}

std::string itostr(unsigned int val)
{
  char buf[BUFFER_SIZE];
  char *it = (char*)&buf[BUFFER_SIZE-2];

  int div = val/100;
  while(div) {
    memcpy(it,&digit_pairs[2*(val-div*100)],2);
    val = div;
    it-=2;
    div = val/100;
  }
  memcpy(it,&digit_pairs[2*val],2);
  if(val<10)
    it++;

  return std::string((char*)it,(char*)&buf[BUFFER_SIZE]-(char*)it);
}

While the info we get here for the algorithms is pretty nice, I think the question is "broken", and I'll explain why I think this:

The question asks to take the performance of int->std::string conversion, and this may be of interest when comparing a commonly available method, such as different stringstream implementations or boost::lexical_cast. It does not, however, make sense when asking for new code, a specialized algorithm, to do this. The reason is that int2string will always involve heap allocation from std::string and if we are trying to squeeze the last out of our conversion algorithm, I do not think it makes sense to mix these measurements up with the heap allocations done by std::string. If I want performant conversion I will always use a fixed size buffer and certainly never allocate anything on the heap!

To sum up, I think the timings should be split:

  • First, fastest (int -> fixed buffer) conversion.
  • Second, timing of (fixed buffer -> std::string) copy.
  • Third, checking how the std::string allocation can directly be used as buffer, to save the copying.

These aspects should not be mixed up in one timing, IMHO.


I can't test under VS, but this seems to be faster than your code for g++, about 10%. It could probably be tuned, the decision values chosen are guesses. int only, sorry.

typedef unsigned buf_t; 

static buf_t * reduce(unsigned val, buf_t * stp) {
   unsigned above = val / 10000; 
   if (above != 0) {
      stp = reduce(above, stp); 
      val -= above * 10000; 
   }

   buf_t digit  = val / 1000; 
   *stp++ = digit + '0'; 
   val -= digit * 1000; 

   digit  = val / 100; 
   *stp++ = digit + '0'; 
   val -= digit * 100; 

   digit  = val / 10; 
   *stp++ = digit + '0'; 
   val -= digit * 10; 
   *stp++ = val + '0'; 
   return stp; 
}

std::string itostr(int input) {

   buf_t buf[16]; 


   if(input == INT_MIN) {  
      char buf2[16]; 
      std::sprintf(buf2, "%d", input); 
      return std::string(buf2); 
   }

   // handle negative
   unsigned val = input;
   if(input < 0) 
      val = -input;

   buf[0] = '0'; 
   buf_t* endp = reduce(val, buf+1); 
   *endp = 127; 

   buf_t * stp = buf+1; 
   while (*stp == '0') 
      stp++;
   if (stp == endp)
      stp--; 

   if (input < 0) { 
      stp--; 
      *stp = '-'; 
   }
   return std::string(stp, endp); 
}

Updated user2985907's answer... modp_ufast...

Integer To String Test (Type 1)
[modp_ufast]Numbers: 240000000  Total:   657777786      Time:  1.1633sec        Rate:206308473.0686nums/sec
[sprintf] Numbers: 240000000    Total:   657777786      Time: 24.3629sec        Rate:  9851045.8556nums/sec
[karma]   Numbers: 240000000    Total:   657777786      Time:  5.2389sec        Rate: 45810870.7171nums/sec
[strtk]   Numbers: 240000000    Total:   657777786      Time:  3.3126sec        Rate: 72450283.7492nums/sec
[so   ]   Numbers: 240000000    Total:   657777786      Time:  3.0828sec        Rate: 77852152.8820nums/sec
[timo ]   Numbers: 240000000    Total:   657777786      Time:  4.7349sec        Rate: 50687912.9889nums/sec
[voigt]   Numbers: 240000000    Total:   657777786      Time:  5.1689sec        Rate: 46431985.1142nums/sec
[hopman]  Numbers: 240000000    Total:   657777786      Time:  4.6169sec        Rate: 51982554.6497nums/sec
Press any key to continue . . .

Integer To String Test(Type 2)
[modp_ufast]Numbers: 240000000  Total:   660000000      Time:  0.5072sec        Rate:473162716.4618nums/sec
[sprintf] Numbers: 240000000    Total:   660000000      Time: 22.3483sec        Rate: 10739062.9383nums/sec
[karma]   Numbers: 240000000    Total:   660000000      Time:  4.2471sec        Rate: 56509024.3035nums/sec
[strtk]   Numbers: 240000000    Total:   660000000      Time:  2.1683sec        Rate:110683636.7123nums/sec
[so   ]   Numbers: 240000000    Total:   660000000      Time:  2.7133sec        Rate: 88454602.1423nums/sec
[timo ]   Numbers: 240000000    Total:   660000000      Time:  2.8030sec        Rate: 85623453.3872nums/sec
[voigt]   Numbers: 240000000    Total:   660000000      Time:  3.4019sec        Rate: 70549286.7776nums/sec
[hopman]  Numbers: 240000000    Total:   660000000      Time:  2.7849sec        Rate: 86178023.8743nums/sec
Press any key to continue . . .

Integer To String Test (type 3)
[modp_ufast]Numbers: 240000000  Total:   505625000      Time:  1.6482sec        Rate:145610315.7819nums/sec
[sprintf] Numbers: 240000000    Total:   505625000      Time: 20.7064sec        Rate: 11590618.6109nums/sec
[karma]   Numbers: 240000000    Total:   505625000      Time:  4.3036sec        Rate: 55767734.3570nums/sec
[strtk]   Numbers: 240000000    Total:   505625000      Time:  2.9297sec        Rate: 81919227.9275nums/sec
[so   ]   Numbers: 240000000    Total:   505625000      Time:  3.0278sec        Rate: 79266003.8158nums/sec
[timo ]   Numbers: 240000000    Total:   505625000      Time:  4.0631sec        Rate: 59068204.3266nums/sec
[voigt]   Numbers: 240000000    Total:   505625000      Time:  4.5616sec        Rate: 52613393.0285nums/sec
[hopman]  Numbers: 240000000    Total:   505625000      Time:  4.1248sec        Rate: 58184194.4569nums/sec
Press any key to continue . . .

int ufast_utoa10(unsigned int value, char* str)
{
#define JOIN(N) N "0", N "1", N "2", N "3", N "4", N "5", N "6", N "7", N "8", N "9"
#define JOIN2(N) JOIN(N "0"), JOIN(N "1"), JOIN(N "2"), JOIN(N "3"), JOIN(N "4"), \
                 JOIN(N "5"), JOIN(N "6"), JOIN(N "7"), JOIN(N "8"), JOIN(N "9")
#define JOIN3(N) JOIN2(N "0"), JOIN2(N "1"), JOIN2(N "2"), JOIN2(N "3"), JOIN2(N "4"), \
                 JOIN2(N "5"), JOIN2(N "6"), JOIN2(N "7"), JOIN2(N "8"), JOIN2(N "9")
#define JOIN4    JOIN3("0"), JOIN3("1"), JOIN3("2"), JOIN3("3"), JOIN3("4"), \
                 JOIN3("5"), JOIN3("6"), JOIN3("7"), JOIN3("8"), JOIN3("9")
#define JOIN5(N) JOIN(N), JOIN(N "1"), JOIN(N "2"), JOIN(N "3"), JOIN(N "4"), \
                 JOIN(N "5"), JOIN(N "6"), JOIN(N "7"), JOIN(N "8"), JOIN(N "9")
#define JOIN6    JOIN5(), JOIN5("1"), JOIN5("2"), JOIN5("3"), JOIN5("4"), \
                 JOIN5("5"), JOIN5("6"), JOIN5("7"), JOIN5("8"), JOIN5("9")
#define F(N)     ((N) >= 100 ? 3 : (N) >= 10 ? 2 : 1)
#define F10(N)   F(N),F(N+1),F(N+2),F(N+3),F(N+4),F(N+5),F(N+6),F(N+7),F(N+8),F(N+9)
#define F100(N)  F10(N),F10(N+10),F10(N+20),F10(N+30),F10(N+40),\
                 F10(N+50),F10(N+60),F10(N+70),F10(N+80),F10(N+90)
  static const short offsets[] = { F100(0), F100(100), F100(200), F100(300), F100(400),
                                  F100(500), F100(600), F100(700), F100(800), F100(900)};
  static const char table1[][4] = { JOIN("") }; 
  static const char table2[][4] = { JOIN2("") }; 
  static const char table3[][4] = { JOIN3("") };
  static const char table4[][5] = { JOIN4 }; 
  static const char table5[][4] = { JOIN6 };
#undef JOIN
#undef JOIN2
#undef JOIN3
#undef JOIN4
  char *wstr;
  int remains[2];
  unsigned int v2;
  if (value >= 100000000) {
    v2 = value / 10000;
    remains[0] = value - v2 * 10000;
    value = v2;
    v2 = value / 10000;
    remains[1] = value - v2 * 10000;
    value = v2;
    wstr = str;
    if (value >= 1000) {
      *(__int32 *) wstr = *(__int32 *) table4[value];
      wstr += 4;
    } else {
      *(__int32 *) wstr = *(__int32 *) table5[value];
      wstr += offsets[value];
    }
    *(__int32 *) wstr = *(__int32 *) table4[remains[1]];
    wstr += 4;
    *(__int32 *) wstr = *(__int32 *) table4[remains[0]];
    wstr += 4;
    *wstr = 0;
    return (wstr - str);
  }
  else if (value >= 10000) {
    v2 = value / 10000;
    remains[0] = value - v2 * 10000;
    value = v2;
    wstr = str;
    if (value >= 1000) {
      *(__int32 *) wstr = *(__int32 *) table4[value];
      wstr += 4;
      *(__int32 *) wstr = *(__int32 *) table4[remains[0]];
      wstr += 4;
      *wstr = 0;
      return 8;
    } else {
      *(__int32 *) wstr = *(__int32 *) table5[value];
      wstr += offsets[value];
      *(__int32 *) wstr = *(__int32 *) table4[remains[0]];
      wstr += 4;
      *wstr = 0;
      return (wstr - str);
    }
  }
  else {
    if (value >= 1000) {
      *(__int32 *) str = *(__int32 *) table4[value];
      str += 4;
      *str = 0;
      return 4;
    } else if (value >= 100) {
      *(__int32 *) str = *(__int32 *) table3[value];
      return 3;
    } else if (value >= 10) {
      *(__int16 *) str = *(__int16 *) table2[value];
      str += 2;
      *str = 0;
      return 2;
    } else {
      *(__int16 *) str = *(__int16 *) table1[value];
      return 1;
    }
  }
}

int ufast_itoa10(int value, char* str) {
  if (value < 0) { *(str++) = '-'; 
    return ufast_utoa10(-value, str) + 1; 
  }
  else return ufast_utoa10(value, str);
}


    void ufast_test() {

   print_mode("[modp_ufast]");

   std::string s;
   s.reserve(32);
   std::size_t total_length = 0;
   strtk::util::timer t;
   t.start();

   char buf[128];
   int len;
   for (int i = (-max_i2s / 2); i < (max_i2s / 2); ++i)
   {
      #ifdef enable_test_type01
      s.resize(ufast_itoa10(((i & 1) ? i : -i), const_cast<char*>(s.c_str())));
      total_length += s.size();
      #endif

      #ifdef enable_test_type02
      s.resize(ufast_itoa10(max_i2s + i, const_cast<char*>(s.c_str())));
      total_length += s.size();
      #endif

      #ifdef enable_test_type03
      s.resize(ufast_itoa10(randval[(max_i2s + i) & 1023], const_cast<char*>(s.c_str())));
      total_length += s.size();
      #endif
   }
   t.stop();
   printf("Numbers:%10lu\tTotal:%12lu\tTime:%8.4fsec\tRate:%14.4fnums/sec\n",
          static_cast<unsigned long>(3 * max_i2s),
          static_cast<unsigned long>(total_length),
          t.time(),
          (3.0 * max_i2s) / t.time());
}