Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance of golang select statement in a for loop

Tags:

go

I make a test to see the performance of select, and found the result is not good. The go version is 1.7.3

package main

import (
    "fmt"
    "log"
    "os"
    "runtime/pprof"
    "time"
)

var serverDone = make(chan struct{})
var serverDone1 = make(chan struct{})
var serverDone2 = make(chan struct{})
var serverDone3 = make(chan struct{})
var serverDone4 = make(chan struct{})
var serverDone5 = make(chan struct{})

func main() {
    f, err := os.Create("cpu.pprof")
    if err != nil {
        log.Fatal(err)
    }
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()

    for i := 0; i < 1000; i++ {
        go messageLoop()
    }
    <-time.After(10 * time.Second)
    close(serverDone)
    fmt.Println("finished")
}

func messageLoop() {
    var ticker = time.NewTicker(100 * time.Millisecond)
    defer ticker.Stop()
    var counter = 0
    for {
        select {
        case <-serverDone:
            return
        case <-serverDone1:
            return
        // case <-serverDone2:
        //  return
        // case <-serverDone3:
        //  return
        // case <-serverDone4:
        //  return
        // case <-serverDone5:
        //  return
        case <-ticker.C:
            counter += 1
        }
    }
}

When run the above code, you will find the CPU up(in my book, about 5%) each time when a serverDone case is added.
When all of the serverDone case are removed, the CPU is about 5%, It's not good.
If I turn globally locked object(like serverDone) to locally, the performance is better, but still not good enough.

Who knows is there anything wrong in my case, or what is the correct usage of select statement?

like image 407
fwang2002 Avatar asked Feb 06 '17 03:02

fwang2002


Video Answer


1 Answers

Short Answer : Channels uses mutex. More channels means more futex system calls

Here is the strace on programs .

The code with 7 select statements waiting for 7 channels

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.20    0.424434          13     33665      6061 futex
  1.09    0.004731          10       466           sched_yield
  0.47    0.002038          30        67           select
  0.11    0.000484           4       114           rt_sigaction
  0.05    0.000203           5        41         8 rt_sigreturn
  0.03    0.000128           9        15           mmap
  0.02    0.000081          27         3           clone
  0.01    0.000052           7         8           rt_sigprocmask
  0.01    0.000032          32         1           openat
  0.00    0.000011           4         3           setitimer
  0.00    0.000009           5         2           sigaltstack
  0.00    0.000008           8         1           munmap
  0.00    0.000006           6         1           execve
  0.00    0.000006           6         1           sched_getaffinity
  0.00    0.000004           4         1           arch_prctl
  0.00    0.000004           4         1           gettid
  0.00    0.000000           0         2         2 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.432231                 34392      6071 total

The code with 3 select statements waiting for 3 channels

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.47    0.118614          11     10384      1333 futex
  6.64    0.008704          11       791           sched_yield
  2.06    0.002706          23       120           select
  0.39    0.000512           4       114           rt_sigaction
  0.14    0.000181           8        22         2 rt_sigreturn
  0.10    0.000131           9        15           mmap
  0.05    0.000060          60         1           openat
  0.04    0.000057          19         3           setitimer
  0.04    0.000051          17         3           clone
  0.03    0.000045           6         8           rt_sigprocmask
  0.01    0.000009           9         1           execve
  0.01    0.000009           5         2           sigaltstack
  0.01    0.000009           9         1           sched_getaffinity
  0.01    0.000008           8         1           munmap
  0.01    0.000007           7         1           arch_prctl
  0.00    0.000005           5         1           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    0.131108                 11468      1335 total

As it is clear here the number of futex calls are proportional to the number of channels and futex system calls are the reason for this performance .

Here is explanation on that

You may find the channel implementation in the following file src/runtime/chan.go .

Here is hchan the struct for a channel

type hchan struct {
    qcount   uint           // total data in the queue
    dataqsiz uint           // size of the circular queue
    buf      unsafe.Pointer // points to an array of dataqsiz elements
    elemsize uint16
    closed   uint32
    elemtype *_type // element type
    sendx    uint   // send index
    recvx    uint   // receive index
    recvq    waitq  // list of recv waiters
    sendq    waitq  // list of send waiters
    lock     mutex
}

There's a Lock embedded structure that is defined in runtime2.go and that serves as a mutex (futex) or semaphore depending on the OS.

So with increase in number of channels more futex system call calls be there and that would affect performance

You may read more about these in : futex(2),Channels in steroids

like image 159
Sarath Sadasivan Pillai Avatar answered Sep 21 '22 05:09

Sarath Sadasivan Pillai