Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient to delete all substrings of other elements within an array in Ruby

I have a complex problem of in-place editing of an array at hand. I have an array in which some of the elements are sub-strings of other elements.I want to delete all the sub-strings and keep the supersets/strings only. i.e. Array => ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3'] After operation I should have a sanitized array => ['1 1 1 2', '1 2 3 1']

Is there an efficient algorithm to achieve the same ?

like image 789
Nishutosh Sharma Avatar asked Oct 21 '16 18:10

Nishutosh Sharma


2 Answers

This approach uses some array math to remove itself from the array and then checks to see if it shows up as a substring. I have no idea how performant this is.

a = ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3']
a.uniq.delete_if { |i| (a-[i]).any? {|j| j.include? i } }

I changed to using a delete_if as it will improve performance as you are shortening your array anytime a substring is found making subsequent checks slightly faster.

UPDATE: Cary Swoveland found an issue when the array contains duplicates. I have added a uniq to dedupe the array first although it's not entirely clear what should happen if an element is repeated, should both be removed since they are substrings of each other? I have solved this problem on the assumption that duplicates result in only one item showing in the output, but this could be wrong.

like image 199
Jeff Price Avatar answered Sep 25 '22 15:09

Jeff Price


It uses less memory, performs less computations. This one will delete substrings both ways, looping will be less. Brought

             user       system     total       real
    First    0.000000   0.000000   0.000000 (  0.000076)
    Second   0.010000   0.000000   0.010000 (  0.000037)
    Third    0.000000   0.000000   0.000000 (  0.000019)

Above mentioned is the benchmark results for the 2 algos mentioned above(First and Second) and this one(Third).

array = ['1 1 1', '1', '1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3', '1 2 3', '1 1 1']

i1 = 0
arr_len = array.length
last_index = arr_len - 1

while i1 <= last_index
  w1 = array[i1]
  i2 = i1 + 1
  while i2 <= last_index
    w2 = array[i2]
    # If w2 is a subset of w1
    if w1.include? w2
      # Delete from index i2
      array.delete_at(i2)
      # Decrement the array_length as one element is deleted
      arr_len -= 1
      # Decrement last index, as one element is deleted
      last_index -= 1
      next
    end
    # If w1 comes out to be a subset of w2
    if w2.include? w1
      # Delete the value from that index
      array.delete_at(i1)
      # Decrement the array_length as one element is deleted
      arr_len -= 1
      # Decrement last index, as one element is deleted
      last_index -= 1
      # Reset value of w1 as it is deleted in this operation
      w1 = array[i1]
      # Reset index of 2nd loop to start matching again
      i2 = i1 + 1
      # Move next from here only
      next
    end
    i2 += 1
  end
  i1 += 1
end
like image 31
Nishutosh Sharma Avatar answered Sep 23 '22 15:09

Nishutosh Sharma