Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort array by batches in ruby

Tags:

sorting

ruby

Sorry if this has been asked before, I am not sure how to even search for it and what I have searched for doesn't yield any useful answer.

Here is my issue, I have a framework that basically manages the jobs that will be submitted to a PBS cluster and each job will need to read from an input file. We are in a case in which we have more than 5k jobs that need to run and there are batches of, lets say, ~30 that read from different files but the rest of them read from a file which is being read by another job.

This could be easily dealt with (although not the best solution buy maybe the quickest one for the timescale we have) by being able to sort the job list by the ID which basically means which file it is going to be reading from, i.e. I would like to sort an array like this

a = [1,1,1,2,2,2,3,3,3,4,4,4]

into

a = [1,2,3,4,1,2,3,4,1,2,3,4]

Is there a way of achieving such an ordering in ruby? I could think of an algorithm buy maybe it has already been done and someone knows the answer.

Thanks!

like image 880
Juanpe Araque Avatar asked Dec 19 '16 21:12

Juanpe Araque


3 Answers

Solution

Thanks to @sagarpandya82 for the original idea and @Cary Swoveland for bug finding!

Either use 2 methods :

def safe_transpose_and_flatten(array)
  l = array.map(&:length).max
  array.map{|e| e.values_at(0...l)}.transpose.flatten.compact
end

def sort_by_batches(array)
  safe_transpose_and_flatten(array.sort.group_by{|i| i}.values)
end

Or this one-liner (split over multiple lines for relative readability) :

def sort_by_batches(array)
  array.group_by{|i| i }.values                  # Chunks of equal values,
       .sort_by{|v| -v.size }                    # sorted by decreasing length,
       .reduce(&:zip)                            # transposed,
       .map{|r| r.flatten.compact.sort }.flatten # flattened and sorted
end

Example

a = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
sort_by_batches(a) # => [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

a = [1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1]
sort_by_batches(a) # => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]

a = [1,2,2,3,3,3]
sort_by_batches(a) # => [1, 2, 3, 2, 3, 3]

Steps

Here are the steps for the second array :

[1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1] # input
{1=>[1, 1, 1, 1], 3=>[3, 3, 3, 3], 2=>[2, 2, 2], 4=>[4, 4, 4], 5=>[5]} # group_by
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # values
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # sort_by -length
[[[[[1, 3], 2], 4], 5], [[[[1, 3], 2], 4], nil], [[[[1, 3], 2], 4], nil], [[[[1, 3], nil], nil], nil]] # zip
[[1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4], [1, 3]] # map(&:flatten) and compact
[1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3] # flatten

.reduce(&:zip).map(&:flatten).compact was used at first as a supposedly safe transpose, but it didn't work when the first array was smaller than the others.

The first method use this answer for transposing, the one-liner sorts the arrays by decreasing length before using zip.

Application to Job class

Here's a very basic Job class as an example :

class Job
  attr_reader :id
  def initialize(id)
    @id = id
  end

  def self.sort_by_batches(jobs)
    safe_transpose_and_flatten(jobs.sort_by{|j| j.id}.group_by{|j| j.id}.values)
  end

  def to_s
    "Job %d" % id
  end
end

jobs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4].map{|i| Job.new(i)}
Job.sort_by_batches(jobs)

It outputs :

Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4
like image 138
Eric Duminil Avatar answered Nov 09 '22 12:11

Eric Duminil


You could do this with a collation function:

def collate(input)
  # Split the input array up into chunks of identical values
  # and sort the resulting groups.
  sets = input.group_by { |v| v }.values.sort_by(&:first)

  # Recombine these into a single output array by iterating over
  # each set and transposing values. Any nil values are scrubbed
  # with compact.
  (0...sets.map(&:length).max).flat_map do |i|
    sets.map do |s|
      s[i]
    end
  end.compact
end

You can see this work on some less trivial data:

input = [1,1,3,2,2,2,3,3,3,4,4,4,5,1,1]

collate(input)
# => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]

Here 5 only shows up once.

like image 45
tadman Avatar answered Nov 09 '22 13:11

tadman


Code

def doit(a)    
  b = a.sort.slice_when { |x,y| x != y }
  b.max_by(&:size).size.times.flat_map { |i| b.each_with_object([]) { |c,arr|
    arr << c[i] unless c[i].nil? } } 
end

Example

doit [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]
  #=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]

Explanation

For the example the steps are as follows.

a = [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]

c = a.sort
  #=> [1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 7] 
b = c.slice_when { |x,y| x != y }
  #=> #<Enumerator: #<Enumerator::Generator:0x007fb8a99d94c8>:each> 

We can see the elements that are generated by the enumerator b (and passed to the block) by converting it to an array:

b.to_a
  #=> [[1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [7]] 

Continuing,

c = b.max_by(&:size)
  #=> [3, 3, 3] 
d = c.size
  #=> 3 
e = d.times
  #=> #<Enumerator: 3:times>
e.to_a
  #=> [0, 1, 2] 
e.flat_map { |i| b.each_with_object([]) { |c,arr| arr << c[i] unless c[i].nil? } }
  #=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3] 

Here is the last operation with some included puts statements.

3.times.flat_map do |i|
  puts "i=#{i}"
  b.each_with_object([]) do |c,arr|
    puts "  c=#{c}, c[#{i}]=#{c[i]}, arr=#{arr}"
    arr << c[i] unless c[i].nil?
    puts "  arr after arr << c[#{i}]=#{arr}" unless c[i].nil? 
  end
end

# i=0
#   c=[1, 1], c[0]=1, arr=[]
#     arr after arr << c[0]=[1]
#   c=[2, 2], c[0]=2, arr=[1]
#     arr after arr << c[0]=[1, 2]
#   c=[3, 3, 3], c[0]=3, arr=[1, 2]
#     arr after arr << c[0]=[1, 2, 3]
#   c=[4], c[0]=4, arr=[1, 2, 3]
#     arr after arr << c[0]=[1, 2, 3, 4]
#   c=[5, 5], c[0]=5, arr=[1, 2, 3, 4]
#     arr after arr << c[0]=[1, 2, 3, 4, 5]
#   c=[7], c[0]=7, arr=[1, 2, 3, 4, 5]
#     arr after arr << c[0]=[1, 2, 3, 4, 5, 7]
# i=1
#   c=[1, 1], c[1]=1, arr=[]
#     arr after arr << c[1]=[1]
#   c=[2, 2], c[1]=2, arr=[1]
#     arr after arr << c[1]=[1, 2]
#   c=[3, 3, 3], c[1]=3, arr=[1, 2]
#     arr after arr << c[1]=[1, 2, 3]
#   c=[4], c[1]=, arr=[1, 2, 3]
#   c=[5, 5], c[1]=5, arr=[1, 2, 3]
#     arr after arr << c[1]=[1, 2, 3, 5]
#   c=[7], c[1]=, arr=[1, 2, 3, 5]
# i=2
#   c=[1, 1], c[2]=, arr=[]
#   c=[2, 2], c[2]=, arr=[]
#   c=[3, 3, 3], c[2]=3, arr=[]
#     arr after arr << c[2]=[3]
#   c=[4], c[2]=, arr=[3]
#   c=[5, 5], c[2]=, arr=[3]
#   c=[7], c[2]=, arr=[3]
#=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3] 
like image 27
Cary Swoveland Avatar answered Nov 09 '22 14:11

Cary Swoveland