Sorry if this has been asked before, I am not sure how to even search for it and what I have searched for doesn't yield any useful answer.
Here is my issue, I have a framework that basically manages the jobs that will be submitted to a PBS cluster and each job will need to read from an input file. We are in a case in which we have more than 5k jobs that need to run and there are batches of, lets say, ~30 that read from different files but the rest of them read from a file which is being read by another job.
This could be easily dealt with (although not the best solution buy maybe the quickest one for the timescale we have) by being able to sort the job list by the ID which basically means which file it is going to be reading from, i.e. I would like to sort an array like this
a = [1,1,1,2,2,2,3,3,3,4,4,4]
into
a = [1,2,3,4,1,2,3,4,1,2,3,4]
Is there a way of achieving such an ordering in ruby? I could think of an algorithm buy maybe it has already been done and someone knows the answer.
Thanks!
Thanks to @sagarpandya82 for the original idea and @Cary Swoveland for bug finding!
Either use 2 methods :
def safe_transpose_and_flatten(array)
l = array.map(&:length).max
array.map{|e| e.values_at(0...l)}.transpose.flatten.compact
end
def sort_by_batches(array)
safe_transpose_and_flatten(array.sort.group_by{|i| i}.values)
end
Or this one-liner (split over multiple lines for relative readability) :
def sort_by_batches(array)
array.group_by{|i| i }.values # Chunks of equal values,
.sort_by{|v| -v.size } # sorted by decreasing length,
.reduce(&:zip) # transposed,
.map{|r| r.flatten.compact.sort }.flatten # flattened and sorted
end
a = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
sort_by_batches(a) # => [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
a = [1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1]
sort_by_batches(a) # => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]
a = [1,2,2,3,3,3]
sort_by_batches(a) # => [1, 2, 3, 2, 3, 3]
Here are the steps for the second array :
[1, 1, 3, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 1, 1] # input
{1=>[1, 1, 1, 1], 3=>[3, 3, 3, 3], 2=>[2, 2, 2], 4=>[4, 4, 4], 5=>[5]} # group_by
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # values
[[1, 1, 1, 1], [3, 3, 3, 3], [2, 2, 2], [4, 4, 4], [5]] # sort_by -length
[[[[[1, 3], 2], 4], 5], [[[[1, 3], 2], 4], nil], [[[[1, 3], 2], 4], nil], [[[[1, 3], nil], nil], nil]] # zip
[[1, 2, 3, 4, 5], [1, 2, 3, 4], [1, 2, 3, 4], [1, 3]] # map(&:flatten) and compact
[1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3] # flatten
.reduce(&:zip).map(&:flatten).compact
was used at first as a supposedly safe transpose, but it didn't work when the first array was smaller than the others.
The first method use this answer for transposing, the one-liner sorts the arrays by decreasing length before using zip
.
Here's a very basic Job class as an example :
class Job
attr_reader :id
def initialize(id)
@id = id
end
def self.sort_by_batches(jobs)
safe_transpose_and_flatten(jobs.sort_by{|j| j.id}.group_by{|j| j.id}.values)
end
def to_s
"Job %d" % id
end
end
jobs = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4].map{|i| Job.new(i)}
Job.sort_by_batches(jobs)
It outputs :
Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4
Job 1
Job 2
Job 3
Job 4
You could do this with a collation function:
def collate(input)
# Split the input array up into chunks of identical values
# and sort the resulting groups.
sets = input.group_by { |v| v }.values.sort_by(&:first)
# Recombine these into a single output array by iterating over
# each set and transposing values. Any nil values are scrubbed
# with compact.
(0...sets.map(&:length).max).flat_map do |i|
sets.map do |s|
s[i]
end
end.compact
end
You can see this work on some less trivial data:
input = [1,1,3,2,2,2,3,3,3,4,4,4,5,1,1]
collate(input)
# => [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 3]
Here 5
only shows up once.
Code
def doit(a)
b = a.sort.slice_when { |x,y| x != y }
b.max_by(&:size).size.times.flat_map { |i| b.each_with_object([]) { |c,arr|
arr << c[i] unless c[i].nil? } }
end
Example
doit [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]
#=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]
Explanation
For the example the steps are as follows.
a = [5, 1, 7, 2, 3, 3, 5, 2, 3, 1, 4]
c = a.sort
#=> [1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 7]
b = c.slice_when { |x,y| x != y }
#=> #<Enumerator: #<Enumerator::Generator:0x007fb8a99d94c8>:each>
We can see the elements that are generated by the enumerator b
(and passed to the block) by converting it to an array:
b.to_a
#=> [[1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [7]]
Continuing,
c = b.max_by(&:size)
#=> [3, 3, 3]
d = c.size
#=> 3
e = d.times
#=> #<Enumerator: 3:times>
e.to_a
#=> [0, 1, 2]
e.flat_map { |i| b.each_with_object([]) { |c,arr| arr << c[i] unless c[i].nil? } }
#=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]
Here is the last operation with some included puts
statements.
3.times.flat_map do |i|
puts "i=#{i}"
b.each_with_object([]) do |c,arr|
puts " c=#{c}, c[#{i}]=#{c[i]}, arr=#{arr}"
arr << c[i] unless c[i].nil?
puts " arr after arr << c[#{i}]=#{arr}" unless c[i].nil?
end
end
# i=0
# c=[1, 1], c[0]=1, arr=[]
# arr after arr << c[0]=[1]
# c=[2, 2], c[0]=2, arr=[1]
# arr after arr << c[0]=[1, 2]
# c=[3, 3, 3], c[0]=3, arr=[1, 2]
# arr after arr << c[0]=[1, 2, 3]
# c=[4], c[0]=4, arr=[1, 2, 3]
# arr after arr << c[0]=[1, 2, 3, 4]
# c=[5, 5], c[0]=5, arr=[1, 2, 3, 4]
# arr after arr << c[0]=[1, 2, 3, 4, 5]
# c=[7], c[0]=7, arr=[1, 2, 3, 4, 5]
# arr after arr << c[0]=[1, 2, 3, 4, 5, 7]
# i=1
# c=[1, 1], c[1]=1, arr=[]
# arr after arr << c[1]=[1]
# c=[2, 2], c[1]=2, arr=[1]
# arr after arr << c[1]=[1, 2]
# c=[3, 3, 3], c[1]=3, arr=[1, 2]
# arr after arr << c[1]=[1, 2, 3]
# c=[4], c[1]=, arr=[1, 2, 3]
# c=[5, 5], c[1]=5, arr=[1, 2, 3]
# arr after arr << c[1]=[1, 2, 3, 5]
# c=[7], c[1]=, arr=[1, 2, 3, 5]
# i=2
# c=[1, 1], c[2]=, arr=[]
# c=[2, 2], c[2]=, arr=[]
# c=[3, 3, 3], c[2]=3, arr=[]
# arr after arr << c[2]=[3]
# c=[4], c[2]=, arr=[3]
# c=[5, 5], c[2]=, arr=[3]
# c=[7], c[2]=, arr=[3]
#=> [1, 2, 3, 4, 5, 7, 1, 2, 3, 5, 3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With