Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map Sets in Golang

Tags:

struct

go

set

map

If I have a struct like:

type Foo struct {
  title string
  Tags map[string]string
}

How might approach maintaining a unique set of such structs? From what I understand, although struct equality is a thing - map equality isn't. This means I can't compare my above structs. Therefore I can't just implement the map as set pattern.

The two options that might work I can think of are: convert the Tags to a sorted [][]string or use reflect.Deepequal. Anyone have a better idea?

like image 235
Kyle Brandt Avatar asked Oct 21 '22 18:10

Kyle Brandt


1 Answers

There are a few ways of implementing this. James Henstridge actually had a good idea, and I attempted to implement it. It performed pretty poorly just to use map in the first place, without my own hash algorithm.

The way I solved this problem is just keep an array of your structs and then remove any duplicates as you insert them.

package structset

type Foo struct {
  title string
  Tags  map[string]string
}

func (f Foo) Equals(f2 Foo) bool {
  if f.title != f2.title {
    return false
  }

  if len(f.Tags) != len(f2.Tags) {
    return false
  }

  for k, v := range f.Tags {
    if w, ok := f2.Tags[k]; !ok || v != w {
      return false
    }
  }

  return true
}

type FooSet []Foo

func (this FooSet) Add(value Foo) {
  if !this.Contains(value) {
    this = append(this, value)
  }
}

func (this FooSet) Length() int {
  return len(this)
}

func (this FooSet) Contains(f Foo) bool {
  for _, v := range this {
    if v.Equals(f) {
      return true
    }
  }
  return false
}

func NewSet() FooSet {
  return FooSet(make([]Foo, 0, 100))
}

I benchmarked this on my i7-3770K Windows machine and got:

BenchmarkSmallSetWithFewCollisions         50000             46615 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             46575 ns/op
BenchmarkSmallSetWithManyCollisions        50000             46605 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2335296 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2352298 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2336796 ns/op
BenchmarkLargeSetWithFewCollisions            50          46805944 ns/op
BenchmarkLargeSetWithMoreCollisions           50          47376016 ns/op
BenchmarkLargeSetWithManyCollisions           50          46815946 ns/op

To eek out a very small amount of performance, you can insert all your data into the array first, and then remove all duplicates after.

The remove duplicates code is:

func (this FooSet) RemoveDuplicates() {
  length := len(this) - 1
  for i := 0; i < length; i++ {
    for j := i + 1; j <= length; j++ {
      if this[i].Equals(this[j]) {
        this[j] = this[length]
        this = this[0:length]
        length--
        j--
      }
    }
  }
}

The benchmarks for this is:

BenchmarkSmallSetWithFewCollisions         50000             45245 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             45615 ns/op
BenchmarkSmallSetWithManyCollisions        50000             45555 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2294791 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2309293 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2286290 ns/op
BenchmarkLargeSetWithFewCollisions            50          46235870 ns/op
BenchmarkLargeSetWithMoreCollisions           50          46515906 ns/op
BenchmarkLargeSetWithManyCollisions           50          45865824 ns/op

Here is the benchmark of just assigning Foo to a map[string]Foo.

BenchmarkSmallSetWithFewCollisions         50000             65718 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             64238 ns/op
BenchmarkSmallSetWithManyCollisions        50000             55016 ns/op
BenchmarkMediumSetWithFewCollisions          500           3429435 ns/op
BenchmarkMediumSetWithMoreCollisions         500           3117395 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2826858 ns/op
BenchmarkLargeSetWithFewCollisions            20          82635495 ns/op
BenchmarkLargeSetWithMoreCollisions           20          85285830 ns/op
BenchmarkLargeSetWithManyCollisions           20          73659350 ns/op

It seems to me even if a map was hashable, it still wouldn't perform very well.

like image 89
kdar Avatar answered Oct 24 '22 01:10

kdar