Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time complexity of gsub

A long string s contains only 0 and 1. This Ruby code counts how many 1 there are:

s.gsub(/1/).count

What is the time complexity in Big O notation? Is there a tool to do the calculation?

like image 607
canoe Avatar asked Jan 12 '23 20:01

canoe


1 Answers

As far as I know there is not a generic tool to calculate Big O notation for arbitrary code - this would be a hard task - for a start it is not always clear which scaling issue to measure. Even very simple logic, a = b + c does not scale as you might think - if you need to allow for arbitrarily large numbers then this is not O( 1 ).

The simplest thing to do is benchmark and plot the results. You should be aware that for highly efficient code that the scaling measure can be "drowned out" by for example method-calling overhead. So it takes a little effort - you need to find values of N large enough that doubling it makes a significant difference.

To verify that your command is O( N ) on string length, I ran the following benchmarks:

require 'benchmark'

def times_for length
  str = '012123132132121321313232132313232132' * length
  Benchmark.bm do |x|
    x.report( :gsub ) { 1000.times { str.gsub(/1/).count } }
  end
end

times_for 250
       user     system      total        real
gsub  1.510000   0.000000   1.510000 (  1.519658)


times_for 500
       user     system      total        real
gsub  2.960000   0.000000   2.960000 (  2.962822)


times_for 1000
       user     system      total        real
gsub  5.880000   0.010000   5.890000 (  5.879543)

By inspection you can see this is a simple linear relationship - every time I double the length of the string, the time taken (roughly) doubles.

By the way, s.count('1') is much, much faster, but is also O( N ):

times_for 250
       user     system      total        real
count  0.010000   0.000000   0.010000 (  0.008791)

times_for 500
       user     system      total        real
count  0.010000   0.000000   0.010000 (  0.016489)

times_for 1000
       user     system      total        real
count  0.020000   0.000000   0.020000 (  0.029052)

You could take the return values from benchmarking, and use them to automate a guess at Big O. This is probably what services like codility do.

like image 164
Neil Slater Avatar answered Jan 21 '23 18:01

Neil Slater