Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Go: What is the fastest/cleanest way to remove multiple entries from a slice?

Tags:

slice

go

How would you implement the deleteRecords function in the code below:

Example:

type Record struct {
  id int
  name string
}

type RecordList []*Record

func deleteRecords( l *RecordList, ids []int ) {
   // Assume the RecordList can contain several 100 entries.
   // and the number of the of the records to be removed is about 10.
   // What is the fastest and cleanest ways to remove the records that match
   // the id specified in the records list.
}
like image 380
Jeroen Dirks Avatar asked Feb 16 '11 19:02

Jeroen Dirks


2 Answers

I did some micro-benchmarking on my machine, trying out most of the approaches given in the replies here, and this code comes out fastest when you've got up to about 40 elements in the ids list:

func deleteRecords(data []*Record, ids []int) []*Record {
    w := 0 // write index

loop:
    for _, x := range data {
        for _, id := range ids {
            if id == x.id {
                continue loop
            }
        }
        data[w] = x
        w++
    }
    return data[:w]
}

You didn't say whether it's important to preserve the order of records in the list. If you don't then this function is faster than the above and still fairly clean.

func reorder(data []*Record, ids []int) []*Record {
    n := len(data)
    i := 0
loop:
    for i < n {
        r := data[i]
        for _, id := range ids {
            if id == r.id {
                data[i] = data[n-1]
                n--
                continue loop
            }
        }
        i++
    }
    return data[0:n]
}

As the number of ids rises, so does the cost of the linear search. At around 50 elements, using a map or doing a binary search to look up the id becomes more efficient, as long as you can avoid rebuilding the map (or resorting the list) every time. At several hundred ids, it becomes more efficient to use a map or a binary search even if you have to rebuild it every time.

If you wish to preserve original contents of the slice, something like this is more appropriate:

func deletePreserve(data []*Record, ids []int) []*Record {
    wdata := make([]*Record, len(data))
    w := 0
loop:
    for _, x := range data {
        for _, id := range ids {
            if id == x.id {
                continue loop
            }
        }
        wdata[w] = x
        w++
    }
    return wdata[0:w]
}
like image 136
rog Avatar answered Oct 04 '22 03:10

rog


For a personal project, I did something like this:

func filter(sl []int, fn func(int) bool) []int {
    result := make([]int, 0, len(sl))
    last := 0
    for i, v := range sl {
        if fn(v) {
            result = append(result, sl[last:i]...)
            last = i + 1 
        }   
    }   
    return append(result, sl[last:]...)
}

It doesn't mutate the original, but should be relatively efficient. It's probably better to just do:

func filter(sl []int, fn func(int) bool) (result []int) {
    for _, v := range sl {
       if !fn(v) {
         result = append(result, v)
       }
    }
    return
}

Simpler and cleaner. If you want to do it in-place, you probably want something like:

func filter(sl []int, fn func(int) bool) []int {
    outi := 0
    res := sl
    for _, v := range sl {
        if !fn(v) {
            res[outi] = v 
            outi++
        }   
    }   
    return res[0:outi]
}

You can optimize this to use copy to copy ranges of elements, but that's twice the code and probably not worth it.

So, in this specific case, I'd probably do something like:

func deleteRecords(l []*Record, ids []int) []*Record {
    outi := 0
L:
    for _, v := range l { 
        for _, id := range ids {
            if v.id == id {
                continue L
            }   
        }   
        l[outi] = v 
        outi++
    }   
    return l[0:outi]
}

(Note: untested.)

No allocations, nothing fancy, and assuming the rough size of the list of Records and the list of ids you presented, a simple linear search is likely to do as well as fancier things but without any overhead. I realize that my version mutates the slice and returns a new slice, but that's not un-idiomatic in Go, and it avoids forcing the slice at the callsite to be heap allocated.

like image 33
Kyle C Avatar answered Oct 04 '22 03:10

Kyle C