Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Balanced Distribution Algorithm

I'm working on some code for a loosely coupled cluster. To achieve optimal performance during jobs, I have the cluster remap its data each time a child enters or exits. This will eventually be made optional, but for now it performs its data balancing by default. My balancing is basically just making sure that each child never has more than the average number of files per machine, plus one. The plus one is for the remainder if the division isn't clean. And since the remainder will always be less than the number of children [except 0 case, but we can exclude that], children after a balancing will have at most avg + 1.

Everything seems fine, until I realized my algorithm is O(n!). Go down the list of children, find out the avg, remainder, who has too many and who has too few. For each child in the too many list, go through list, send to each child who has too few.

Is there a better solution to this? I feel there must be.

Edit: Here is some psuedocode to show how i derived O(n!):

foreach ( child in children ) {
    if ( child.dataLoad > avg + 1 ) {
        foreach ( child2 in children ) {
            if ( child != child2 && child2.dataLoad < avg ) {
                sendLoad(child, child2)
            }
        }
    }
}

Edit: O(n^2). Foreach n, n => n*n => n^2. I guess I didn't have enough coffee this morning! ;)

In the future I'd like to move to a more flexible and resilient distribution method[weights and hueristics], but for now, a uniform distribution of data works.

like image 446
Nicholas Mancuso Avatar asked Sep 26 '08 13:09

Nicholas Mancuso


2 Answers

@zvrba: You do not even have to sort the list. When traversing the list the second time just move all items with less the average workload to the end of the list (you can keep a pointer to the last item at your first traversal). The order does not have to be perfect, it just changes when the iterators have to be augmented or decreased in your last step.

See previous answer

The last step would look something like:

In the second step keep a pointer to the first item with less than average workload in child2 (to prevent the necessity to have a double link list).

for each child in list {
  if child2 == nil then assert("Error in logic");
  while child.workload > avg + 1 {
    sendwork(child, child2, min(avg + 1 - child2.workload, child.workload - (avg + 1)))
    if child2.workload == avg + 1 then child2 = child2.next;
  }
}
like image 86
Ralph M. Rickenbach Avatar answered Oct 14 '22 04:10

Ralph M. Rickenbach


I think that your analysis is incorrect:

  • walking through the list to find out the average is O(n)
  • making lists of children with too many or too few data chunks is also O(n)
  • moving data is proportional to the amount of data

How did you arrive to O(n!)?

You can sort the list [O(n lg n) in the number of children], so that on the front you have children with too much work, and at the end children with too little work. Then traverse the list from both ends simultaneously: one iterator points to a child with excess data, the other to a child with lack of data. Transfer data, and move either one iterator forward, or the other backward.

like image 26
zvrba Avatar answered Oct 14 '22 06:10

zvrba