Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a slice into N slices

Tags:

slice

split

go

I'm a trying to implement a function that splits a slice of TCP ports into x others slices. Those slices will be sent to workers that will scan those ports, so x is set by the number of workers.

Here is the code :

// createJobs split portsToScan from a specified protocol into an equal number
// of jobs that will be returned.
func (t *Target) createJobs(proto string) ([]jobMsg, error) {
    // init jobs slice
    jobs := []jobMsg{}

    // check protocol accordance
    if _, ok := t.portsToScan[proto]; !ok {
        return nil, fmt.Errorf("no such protocol %q in current protocol list", proto)
    }

    // if proto is ICMP, we do not need to scan ports
    if proto == "icmp" {
        return []jobMsg{
            jobMsg{ip: t.ip, protocol: proto},
        }, nil
    }

    step := (len(t.portsToScan[proto]) + t.workers - 1) / t.workers

    for i := 0; i < len(t.portsToScan[proto]); i += step {
        batch := t.portsToScan[proto][i:min(i+step, len(t.portsToScan[proto]))]

        jobs = append(jobs, jobMsg{
            ip:       t.ip,
            protocol: proto,
            ports:    batch,
        })
    }

    return jobs, nil
}

And here is the unit test corresponding :

func TestTarget_createJobs(t *testing.T) {
    tests := []struct {
        name         string
        pts          map[string][]string
        workersCount int
        wantErr      bool
    }{
        {
            name:         "5-1",
            pts:          map[string][]string{"tcp": []string{"1", "2", "3", "4", "5"}},
            workersCount: 1,
        },
        {
            name:         "5-2",
            pts:          map[string][]string{"tcp": []string{"1", "2", "3", "4", "5"}},
            workersCount: 2,
        },
        {
            name:         "5-3",
            pts:          map[string][]string{"tcp": []string{"1", "2", "3", "4", "5"}},
            workersCount: 3,
        },
        {
            name:         "5-4",
            pts:          map[string][]string{"tcp": []string{"1", "2", "3", "4", "5"}},
            workersCount: 4,
        },
        {
            name:         "5-5",
            pts:          map[string][]string{"tcp": []string{"1", "2", "3", "4", "5"}},
            workersCount: 5,
        },
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            tg := &Target{
                portsToScan: tt.pts,
                workers:     tt.workersCount,
            }
            got, err := tg.createJobs("tcp")
            if (err != nil) != tt.wantErr {
                t.Errorf("Target.createJobs() error = %v, wantErr %v", err, tt.wantErr)
                return
            }
            if len(got) != tt.workersCount {
                t.Errorf("Target.createJobs() = %d, wanted %d jobs; joblist %v", len(got), tt.workersCount, got)
            }
        })
    }
}


func min(a, b int) int {
    if a <= b {
        return a
    }
    return b
}

The output of the test give me this result :

--- FAIL: TestTarget_createJobs/5-4 (0.00s)
scan_test.go:309: Target.createJobs() = 3, wanted 4 jobs; joblist [{ 0  tcp [1 2]} { 0  tcp [3 4]} { 0  tcp [5]}]

Initial ports list is hold in t.portsToScan[proto] and the number of workers (so the number of slices I want to create) is set by t.workers.

At the end, len(jobs) must be equal to t.workers but I can't find how to do it.

like image 481
hacb Avatar asked May 16 '26 17:05

hacb


1 Answers

Your algorithm uses step as the size of batches:

step := (len(t.portsToScan[proto]) + t.workers - 1) / t.workers

This isn't the optimal size. For example if you have 4 ports to scan and 3 workers, this results in step = 2, which means you'll have only 2 jobs (2+2=4). But it would be better (more optimal) to have 3 batches (with sizes 2+1+1=4).

So the size of batches should be

defSize := len(t.portsToScan[proto]) / t.workers

The problem with this is that if the length is not a multiple of t.workers, some of the last elements (ports) will not be assigned to any of the jobs. Using defSize+1 for all jobs would be too many.

So the optimal solution is in the "middle": some jobs will have defSize ports to scan, and some will have defSize+1. How many must have defSize+1? As many as missing if all would have defSize:

numBigger := len(t.portsToScan[proto]) - defSize*t.workers

Note that if there are less ports to scan than workers, the above calculation yields defSize=0, so some workers would get 0 ports to scan, and some would get 1. That's OK, but you shouldn't add jobs with 0 ports to scan.

Using this distribution:

defSize := len(t.portsToScan[proto]) / t.workers
numBigger := len(t.portsToScan[proto]) - defSize*t.workers

size := defSize+1
for i, idx := 0, 0; i < t.workers; i++ {
    if i == numBigger {
        size--
        if size == 0 {
            break // 0 ports left to scan
        }
    }
    jobs = append(jobs, jobMsg{
        ip:       t.ip,
        protocol: proto,
        ports:    t.portsToScan[proto][idx : idx+size],
    })
    idx += size
}
like image 193
icza Avatar answered May 21 '26 03:05

icza



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!