Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to build a string in Ruby?

In Ternary operator, a person wanting to join ["foo", "bar", "baz"] with commas and an "and" cited The Ruby Cookbook as saying

If efficiency is important to you, don't build a new string when you can append items onto an existing string. [And so on]... Use str << var1 << ' ' << var2 instead.

But the book was written in 2006.

Is using appending (ie <<) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?

like image 966
Andrew Grimm Avatar asked Jul 01 '11 14:07

Andrew Grimm


2 Answers

What if your source of string bits is not an array?

TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<


Old hands at benchmarking may have looked at @Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...

    WORDS = (1..1000).map{ rand(10000).to_s }
    N = 1000

    require 'benchmark'
    Benchmark.bmbm do |x|
      x.report("Array#join"){
        N.times{ s = WORDS.join(', ') }
      }
      x.report("Array#join 2"){
        N.times{
          arr = Array.new(WORDS.length)
          arr[0] = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; }
          s = WORDS.join(', ')
        }
      }
      x.report("String#+ 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
        }
      }
      x.report("String#+ 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
        }
      }
      x.report("String#<< 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
        }
      }
      x.report("String#<< 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
        }
      }
      x.report("String#<< 2 A"){
        N.times{
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| s << ", " << w }
        }
      }
    end

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.130000   0.000000   0.130000 (  0.128281)
    Array#join 2    0.220000   0.000000   0.220000 (  0.219588)
    String#+ 1      1.720000   0.770000   2.490000 (  2.478555)
    String#+ 2      1.040000   0.370000   1.410000 (  1.407190)
    String#<< 1     0.370000   0.000000   0.370000 (  0.371125)
    String#<< 2     0.360000   0.000000   0.360000 (  0.360161)
    String#<< 2 A   0.310000   0.000000   0.310000 (  0.318130)

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.090000   0.000000   0.090000 (  0.092072)
    Array#join 2    0.180000   0.000000   0.180000 (  0.180423)
    String#+ 1      3.400000   0.750000   4.150000 (  4.149934)
    String#+ 2      1.740000   0.370000   2.110000 (  2.122511)
    String#<< 1     0.360000   0.000000   0.360000 (  0.359707)
    String#<< 2     0.340000   0.000000   0.340000 (  0.343233)
    String#<< 2 A   0.300000   0.000000   0.300000 (  0.297420)

I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:

    WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }

as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<

words often longer than 23 chars, ruby 2.1.1

                        user     system      total        real
    Array#join      0.150000   0.000000   0.150000 (  0.152846)
    Array#join 2    0.230000   0.010000   0.240000 (  0.231272)
    String#+ 1      7.450000   5.490000  12.940000 ( 12.936776)
    String#+ 2      4.200000   2.590000   6.790000 (  6.791125)
    String#<< 1     0.400000   0.000000   0.400000 (  0.399452)
    String#<< 2     0.380000   0.010000   0.390000 (  0.389791)
    String#<< 2 A   0.340000   0.000000   0.340000 (  0.341099)

words often longer than 23 chars, ruby 1.9.3

                        user     system      total        real
    Array#join      0.130000   0.010000   0.140000 (  0.132957)
    Array#join 2    0.220000   0.000000   0.220000 (  0.220181)
    String#+ 1     20.060000   5.230000  25.290000 ( 25.293366)
    String#+ 2      9.750000   2.670000  12.420000 ( 12.425229)
    String#<< 1     0.390000   0.000000   0.390000 (  0.397733)
    String#<< 2     0.390000   0.000000   0.390000 (  0.390540)
    String#<< 2 A   0.330000   0.000000   0.330000 (  0.333791)
like image 28
Michael Johnston Avatar answered Oct 11 '22 15:10

Michael Johnston


Use Array#join when you can, and String#<< when you can't.

The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:

Ruby 1.9.2p180      user     system      total        real
Array#join      0.320000   0.000000   0.320000 (  0.330224)
String#+ 1      7.730000   0.200000   7.930000 (  8.373900)
String#+ 2      4.670000   0.600000   5.270000 (  5.546633)
String#<< 1     1.260000   0.010000   1.270000 (  1.315991)
String#<< 2     1.600000   0.020000   1.620000 (  1.793415)

JRuby 1.6.1         user     system      total        real
Array#join      0.185000   0.000000   0.185000 (  0.185000)
String#+ 1      9.118000   0.000000   9.118000 (  9.118000)
String#+ 2      4.544000   0.000000   4.544000 (  4.544000)
String#<< 1     0.865000   0.000000   0.865000 (  0.866000)
String#<< 2     0.852000   0.000000   0.852000 (  0.852000)

Ruby 1.8.7p334      user     system      total        real
Array#join      0.290000   0.010000   0.300000 (  0.305367)
String#+ 1      7.620000   0.060000   7.680000 (  7.682265)
String#+ 2      4.820000   0.130000   4.950000 (  4.957258)
String#<< 1     1.290000   0.010000   1.300000 (  1.304764)
String#<< 2     1.350000   0.010000   1.360000 (  1.347226)

Rubinius (head)     user     system      total        real
Array#join      0.864054   0.008001   0.872055 (  0.870757)
String#+ 1      9.636602   0.076005   9.712607 (  9.714820)
String#+ 2      6.456403   0.064004   6.520407 (  6.521633)
String#<< 1     2.196138   0.016001   2.212139 (  2.212564)
String#<< 2     2.176136   0.012001   2.188137 (  2.186298)

Here's the benchmarking code:

WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000

require 'benchmark'
Benchmark.bmbm do |x|
  x.report("Array#join"){
    N.times{ s = WORDS.join(', ') }
  }
  x.report("String#+ 1"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", "; s += w }
    }
  }
  x.report("String#+ 2"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", " + w }
    }
  }
  x.report("String#<< 1"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", "; s << w }
    }
  }
  x.report("String#<< 2"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", " << w }
    }
  }
end

Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.

like image 173
Phrogz Avatar answered Oct 11 '22 13:10

Phrogz