In Ternary operator, a person wanting to join ["foo", "bar", "baz"]
with commas and an "and" cited The Ruby Cookbook as saying
If efficiency is important to you, don't build a new string when you can append items onto an existing string. [And so on]... Use str << var1 << ' ' << var2 instead.
But the book was written in 2006.
Is using appending (ie <<
) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?
TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. +
is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join
& <<
Old hands at benchmarking may have looked at @Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("Array#join 2"){
N.times{
arr = Array.new(WORDS.length)
arr[0] = WORDS.first
WORDS[1..-1].each{ |w| arr << w; }
s = WORDS.join(', ')
}
}
x.report("String#+ 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
}
}
x.report("String#<< 2 A"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
user system total real
Array#join 0.130000 0.000000 0.130000 ( 0.128281)
Array#join 2 0.220000 0.000000 0.220000 ( 0.219588)
String#+ 1 1.720000 0.770000 2.490000 ( 2.478555)
String#+ 2 1.040000 0.370000 1.410000 ( 1.407190)
String#<< 1 0.370000 0.000000 0.370000 ( 0.371125)
String#<< 2 0.360000 0.000000 0.360000 ( 0.360161)
String#<< 2 A 0.310000 0.000000 0.310000 ( 0.318130)
user system total real
Array#join 0.090000 0.000000 0.090000 ( 0.092072)
Array#join 2 0.180000 0.000000 0.180000 ( 0.180423)
String#+ 1 3.400000 0.750000 4.150000 ( 4.149934)
String#+ 2 1.740000 0.370000 2.110000 ( 2.122511)
String#<< 1 0.360000 0.000000 0.360000 ( 0.359707)
String#<< 2 0.340000 0.000000 0.340000 ( 0.343233)
String#<< 2 A 0.300000 0.000000 0.300000 ( 0.297420)
I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:
WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }
as I expected, the impact on +
was quite significant, but I was pleasantly surprised that it had very little impact on join
or <<
user system total real
Array#join 0.150000 0.000000 0.150000 ( 0.152846)
Array#join 2 0.230000 0.010000 0.240000 ( 0.231272)
String#+ 1 7.450000 5.490000 12.940000 ( 12.936776)
String#+ 2 4.200000 2.590000 6.790000 ( 6.791125)
String#<< 1 0.400000 0.000000 0.400000 ( 0.399452)
String#<< 2 0.380000 0.010000 0.390000 ( 0.389791)
String#<< 2 A 0.340000 0.000000 0.340000 ( 0.341099)
user system total real
Array#join 0.130000 0.010000 0.140000 ( 0.132957)
Array#join 2 0.220000 0.000000 0.220000 ( 0.220181)
String#+ 1 20.060000 5.230000 25.290000 ( 25.293366)
String#+ 2 9.750000 2.670000 12.420000 ( 12.425229)
String#<< 1 0.390000 0.000000 0.390000 ( 0.397733)
String#<< 2 0.390000 0.000000 0.390000 ( 0.390540)
String#<< 2 A 0.330000 0.000000 0.330000 ( 0.333791)
Use Array#join
when you can, and String#<<
when you can't.
The problem with using String#+
is that it must create an intermediary (unwanted) string object, while String#<<
mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", "
1,000 times, via Array#join
, String#+
, and String#<<
:
Ruby 1.9.2p180 user system total real
Array#join 0.320000 0.000000 0.320000 ( 0.330224)
String#+ 1 7.730000 0.200000 7.930000 ( 8.373900)
String#+ 2 4.670000 0.600000 5.270000 ( 5.546633)
String#<< 1 1.260000 0.010000 1.270000 ( 1.315991)
String#<< 2 1.600000 0.020000 1.620000 ( 1.793415)
JRuby 1.6.1 user system total real
Array#join 0.185000 0.000000 0.185000 ( 0.185000)
String#+ 1 9.118000 0.000000 9.118000 ( 9.118000)
String#+ 2 4.544000 0.000000 4.544000 ( 4.544000)
String#<< 1 0.865000 0.000000 0.865000 ( 0.866000)
String#<< 2 0.852000 0.000000 0.852000 ( 0.852000)
Ruby 1.8.7p334 user system total real
Array#join 0.290000 0.010000 0.300000 ( 0.305367)
String#+ 1 7.620000 0.060000 7.680000 ( 7.682265)
String#+ 2 4.820000 0.130000 4.950000 ( 4.957258)
String#<< 1 1.290000 0.010000 1.300000 ( 1.304764)
String#<< 2 1.350000 0.010000 1.360000 ( 1.347226)
Rubinius (head) user system total real
Array#join 0.864054 0.008001 0.872055 ( 0.870757)
String#+ 1 9.636602 0.076005 9.712607 ( 9.714820)
String#+ 2 6.456403 0.064004 6.520407 ( 6.521633)
String#<< 1 2.196138 0.016001 2.212139 ( 2.212564)
String#<< 2 2.176136 0.012001 2.188137 ( 2.186298)
Here's the benchmarking code:
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("String#+ 1"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With