Go syscall v.s. C system call

Q: What is syscall in C?

syscall() is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments. Employing syscall() is useful, for example, when invoking a system call that has no wrapper function in the C library.

Q: Can a syscall call another syscall?

System calls can't call other system calls because it wouldn't make sense to go through all the effort of doing a system call when you're already in the kernel.

Q: How long does a syscall take?

Syscalls take at least 1-2 microseconds on most modern machines just for the syscall overhead, and much more time if they're doing anything complex that could block or sleep. Expect at least 20 microseconds and up to the order of milliseconds for IO.

Q: What happens when syscall?

When a user program invokes a system call, a system call instruction is executed, which causes the processor to begin executing the system call handler in the kernel protection domain.

Tags:

c

go

system-calls

Go, and C both involve system calls directly (Technically, C will call a stub).

Technically, write is both a system call and a C function (at least on many systems). However, the C function is just a stub which invokes the system call. Go does not call this stub, it invokes the system call directly, which means that C is not involved here

From Differences between C write call and Go syscall.Write

My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).

What did I miss? What could be a reason and how to optimize them?

Benchmarks:

Go:

package main_test

import (
    "syscall"
    "testing"
)

func writeAll(fd int, buf []byte) error {
    for len(buf) > 0 {
        n, err := syscall.Write(fd, buf)
        if n < 0 {
            return err
        }
        buf = buf[n:]
    }
    return nil
}

func BenchmarkReadWriteGoCalls(b *testing.B) {
    fds, _ := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0)
    message := "hello, world!"
    buffer := make([]byte, 13)
    for i := 0; i < b.N; i++ {
        writeAll(fds[0], []byte(message))
        syscall.Read(fds[1], buffer)
    }
}

#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/socket.h>

int write_all(int fd, void* buffer, size_t length) {
    while (length > 0) {
        int written = write(fd, buffer, length);
        if (written < 0)
            return -1;
        length -= written;
        buffer += written;
    }
    return length;
}

int read_call(int fd, void *buffer, size_t length) {
    return read(fd, buffer, length);
}

struct timespec timer_start(){
    struct timespec start_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
    return start_time;
}

long timer_end(struct timespec start_time){
    struct timespec end_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end_time);
    long diffInNanos = (end_time.tv_sec - start_time.tv_sec) * (long)1e9 + (end_time.tv_nsec - start_time.tv_nsec);
    return diffInNanos;
}

int main() {
    int i = 0;
    int N = 500000;
    int fds[2];
    char message[14] = "hello, world!\0";
    char buffer[14] = {0};

    socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
    struct timespec vartime = timer_start();
    for(i = 0; i < N; i++) {
        write_all(fds[0], message, sizeof(message));
        read_call(fds[1], buffer, 14);
    }
    long time_elapsed_nanos = timer_end(vartime);
    printf("BenchmarkReadWritePureCCalls\t%d\t%.2ld ns/op\n", N, time_elapsed_nanos/N);
}

340 different running, each C running contains 500000 executions, and each Go running contains b.N executions (mostly 500000, few times executed in 1000000 times):

enter image description here

T-Test for 2 Independent Means: The t-value is -22.45426. The p-value is < .00001. The result is significant at p < .05.

enter image description here

T-Test Calculator for 2 Dependent Means: The value of t is 15.902782. The value of p is < 0.00001. The result is significant at p ≤ 0.05.

enter image description here

Update: I managed the proposal in the answers and wrote another benchmark, it shows the proposed approach significantly drops the performance of massive I/O calls, its performance close to CGO calls.

Benchmark:

func BenchmarkReadWriteNetCalls(b *testing.B) {
    cs, _ := socketpair()
    message := "hello, world!"
    buffer := make([]byte, 13)
    for i := 0; i < b.N; i++ {
        cs[0].Write([]byte(message))
        cs[1].Read(buffer)
    }
}

func socketpair() (conns [2]net.Conn, err error) {
    fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
    if err != nil {
        return
    }
    conns[0], err = fdToFileConn(fds[0])
    if err != nil {
        return
    }
    conns[1], err = fdToFileConn(fds[1])
    if err != nil {
        conns[0].Close()
        return
    }
    return
}

func fdToFileConn(fd int) (net.Conn, error) {
    f := os.NewFile(uintptr(fd), "")
    defer f.Close()
    return net.FileConn(f)
}

enter image description here

The above figure shows, 100 different running, each C running contains 500000 executions, and each Go running contains b.N executions (mostly 500000, few times executed in 1000000 times)

844

asked Sep 12 '18 14:09

Jakob

1 Answers

My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).

What did I miss? What could be a reason and how to optimize them?

The reason is that while both C and Go (on a typical platform Go supports—such as Linux or *BSD or Windows) are compiled down to machine code, Go-native code runs in an environment quite different from that of C.

The two chief differences to C are:

Go code runs in the context of so-called goroutines which are freely scheduled by the Go runtime on different OS threads.
Goroutines use their own (growable and reallocatable) lightweight stacks which have nothing to do with the OS-supplied stack C code uses.

So, when Go code wants to make a syscall, quite a lot should happen:

The goroutine which is about to enter a syscall must be "pinned" to the OS thread on which it's currently running.
The execution must be switched to use the OS-supplied C stack.
The necessary preparation in the Go runtime's scheduler are made.
The goroutine enters the syscall.
Upon exiting the execution of the goroutine has to be resumed, which is a relatively involved process in itself which may be additionaly hampered if the goroutine was in the syscall for too long and the scheduler removed the so-called "processor" from under that goroutine, spawned another OS thread and made that processor run another goroutine ("processors", or Ps are thingies which run goroutines on OS threads).

Update to answer the OP's comment

<…> Thus there is no way to optimize and I must suffer that if I make massive IO calls, mustn't I?

It heavily depends on the nature of the "massive I/O" you're after.

If your example (with socketpair(2)) is not toy, there is simply no reason to use syscalls directly: the FDs returned by socketpair(2) are "pollable" and hence the Go runtime may use its native "netpoller" machinery to perform I/O on them. Here is a working code from one of my projects which properly "wraps" FDs produced by socketpair(2) so that they can be used as "regular" sockets (produced by functions from the net standard package):

func socketpair() (net.Conn, net.Conn, error) {
       fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
       if err != nil {
               return nil, nil, err
       }

       c1, err := fdToFileConn(fds[0])
       if err != nil {
               return nil, nil, err
       }

       c2, err := fdToFileConn(fds[1])
       if err != nil {
               c1.Close()
               return nil, nil, err
       }

       return c1, c2, nil
}

func fdToFileConn(fd int) (net.Conn, error) {
       f := os.NewFile(uintptr(fd), "")
       defer f.Close()
       return net.FileConn(f)
}

If you're talking about some other sort of I/O, the answer is that yes, syscalls are not really cheap and if you must do lots of them, there are ways to work around their cost (such as offloading to some C code—linked in or hooked up as an external process—which would somehow batch them so that each call to that C code would result in several syscalls done by the C side).

184

answered Sep 21 '22 02:09

kostix

Related questions
                            
                                Is it valid to print the address of string in C
                            
                                Is it correct to compare a double to zero if you previously initialized it to zero?
                            
                                Return value of fgets()
                            
                                Why aren't pointers to member functions just memory address like data pointers
                            
                                Post Increment in while loop in C
                            
                                Is there a limit on the number of values that can be printed by a single call of printf?
                            
                                Why is it not OK to pass `char **` to a function that takes a `const char **` in C? [duplicate]
                            
                                Is there a way to get the filename from a `FILE*`? [duplicate]
                            
                                How to do AES decryption using OpenSSL
                            
                                Interrupting blocked read
                            
                                When to use QueueUserAPC()?
                            
                                Fast merge of sorted subsets of 4K floating-point numbers in L1/L2
                            
                                openCV Error: Assertion failed (scn == 3 || scn == 4)
                            
                                cmake ignores -D CMAKE_BUILD_TYPE=Debug
                            
                                GCC generate Canary or not?
                            
                                Elegant way of getting number of items for NS_ENUM
                            
                                GDB conditional break on function parameter
                            
                                Shift masked bits to the lsb
                            
                                HPC programming language relying on implicit vectorization
                            
                                C iterate through char array with a pointer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Go syscall v.s. C system call

Tags:

c

go

system-calls

Jakob

People also ask

1 Answers

kostix

Recent Activity

Donate For Us