Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby performance with multiple threads vs one thread

I am writing a program that loads data from four XML files into four different data structures. It has methods like this:

def loadFirst(year)
  File.open("games_#{year}.xml",'r') do |f|
    doc = REXML::Document.new f
    ...
  end
end
def loadSecond(year)
  File.open("teams_#{year}.xml",'r') do |f|
    doc = REXML::Document.new f
    ...
  end
end

etc...

I originally just used one thread and loaded one file after another:

def loadData(year)
  time = Time.now
  loadFirst(year)
  loadSecond(year)
  loadThird(year)
  loadFourth(year)
  puts Time.now - time
end

Then I realized that I should be using multiple threads. My expectation was that loading from each file on a separate thread would be very close to four times as fast as doing it all sequentially (I have a MacBook Pro with an i7 processor):

def loadData(year)
  time = Time.now
  t1 = Thread.start{loadFirst(year)}
  t2 = Thread.start{loadSecond(year)}
  t3 = Thread.start{loadThird(year)}
  loadFourth(year)
  t1.join
  t2.join
  t3.join
  puts Time.now - time
end

What I found was that the version using multiple threads is actually slower than the other. How can this possibly be? The difference is around 20 seconds with each taking around 2 to 3 minutes.

There are no shared resources between the threads. Each opens a different data file and loads data into a different data structure than the others.

like image 584
andrew.cuthbert Avatar asked Oct 21 '22 06:10

andrew.cuthbert


1 Answers

I think (but I'm not sure) the problem is that you are reading (using multiple threads) contents placed on the same disk, so all your threads can't run simultaneously because they wait for IO (disk).

Some days ago I had to do a similar thing (but fetching data from network) and the difference between sequential vs threads was huge.

A possible solution could be to load all file content instead of load it like you did in your code. In your code you read contents line by line. If you load all the content and then process it you should be able to perform much better (because threads should not wait for IO)

like image 185
SDp Avatar answered Oct 27 '22 22:10

SDp