Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use unsafe get a byte slice from a string without memory copy

I have read about "https://github.com/golang/go/issues/25484" about no-copy conversion from []byte to string.

I am wondering if there is a way to convert a string to a byte slice without memory copy?

I am writing a program which processes terra-bytes data, if every string is copied twice in memory, it will slow down the progress. And I do not care about mutable/unsafe, only internal usage, I just need the speed as fast as possible.

Example:

var s string
// some processing on s, for some reasons, I must use string here
// ...
// then output to a writer
gzipWriter.Write([]byte(s))  // !!! Here I want to avoid the memory copy, no WriteString

So the question is: is there a way to prevent from the memory copying? I know maybe I need the unsafe package, but I do not know how. I have searched a while, no answer till now, neither the SO showed related answers works.

like image 485
shawn Avatar asked Nov 29 '22 08:11

shawn


1 Answers

Getting the content of a string as a []byte without copying in general is only possible using unsafe, because strings in Go are immutable, and without a copy it would be possible to modify the contents of the string (by changing the elements of the byte slice).

So using unsafe, this is how it could look like (corrected, working solution):

func unsafeGetBytes(s string) []byte {
    return (*[0x7fff0000]byte)(unsafe.Pointer(
        (*reflect.StringHeader)(unsafe.Pointer(&s)).Data),
    )[:len(s):len(s)]
}

This solution is from Ian Lance Taylor.

One thing to note here: the empty string "" has no bytes as its length is zero. This means there is no guarantee what the Data field may be, it may be zero or an arbitrary address shared among the zero-size variables. If an empty string may be passed, that must be checked explicitly (although there's no need to get the bytes of an empty string without copying...):

func unsafeGetBytes(s string) []byte {
    if s == "" {
        return nil // or []byte{}
    }
    return (*[0x7fff0000]byte)(unsafe.Pointer(
        (*reflect.StringHeader)(unsafe.Pointer(&s)).Data),
    )[:len(s):len(s)]
}

Original, wrong solution was:

func unsafeGetBytesWRONG(s string) []byte {
    return *(*[]byte)(unsafe.Pointer(&s)) // WRONG!!!!
}

See Nuno Cruces's answer below for reasoning.

Testing it:

s := "hi"
data := unsafeGetBytes(s)
fmt.Println(data, string(data))

data = unsafeGetBytes("gopher")
fmt.Println(data, string(data))

Output (try it on the Go Playground):

[104 105] hi
[103 111 112 104 101 114] gopher

BUT: You wrote you want this because you need performance. You also mentioned you want to compress the data. Please know that compressing data (using gzip) requires a lot more computation than just copying a few bytes! You will not see any noticeable performance gain by using this!

Instead when you want to write strings to an io.Writer, it's recommended to do it via io.WriteString() function which if possible will do so without making a copy of the string (by checking and calling WriteString() method which if exists is most likely does it better than copying the string). For details, see What's the difference between ResponseWriter.Write and io.WriteString?

There are also ways to access the contents of a string without converting it to []byte, such as indexing, or using a loop where the compiler optimizes away the copy:

s := "something"
for i, v := range []byte(s) { // Copying s is optimized away
    // ...
}

Also see related questions:

[]byte(string) vs []byte(*string)

What are the possible consequences of using unsafe conversion from []byte to string in go?

What is the difference between the string and []byte in Go?

Does conversion between alias types in Go create copies?

How does type conversion internally work? What is the memory utilization for the same?

like image 200
icza Avatar answered Dec 21 '22 23:12

icza