Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to return hash and bytes in one step in Go?

Tags:

file

hash

go

I'm trying to understand how I can read content of the file, calculate its hash and return its bytes in one Go. So far, I'm doing this in two steps, e.g.

// calculate file checksum
hasher := sha256.New()
f, err := os.Open(fname)
if err != nil {
    msg := fmt.Sprintf("Unable to open file %s, %v", fname, err)
    panic(msg)
}
defer f.Close()
b, err := io.Copy(hasher, f)
if err != nil {
    panic(err)
}
cksum := hex.EncodeToString(hasher.Sum(nil))

// read again (!!!) to get data as bytes array
data, err := ioutil.ReadFile(fname)

Obviously it is not the most efficient way to do this, since read happens twice, once in copy to pass to hasher and another in ioutil to read file and return list of bytes. I'm struggling to understand how I can combine these steps together and do in one go, read data once, calculate any hash and return it along with list of bytes to another layer.

like image 632
Valentin Avatar asked Mar 11 '23 00:03

Valentin


1 Answers

If you want to read a file, without creating a copy of the entire file in memory, and at the same time calculate its hash, you can do so with a TeeReader:

hasher := sha256.New()
f, err := os.Open(fname)
data := io.TeeReader(f, hasher)
// Now read from data as usual, which is still a stream.

What happens here is that any bytes that are read from data (which is a Reader just like the file object f is) will be pushed to hasheras well.

Note, however, that hasher will produce the correct hash only once you have read the entire file through data, and not until then. So if you need the hash before you decide whether or not you want to read the file, you are left with the options of either doing it in two passes (for example like you are now), or to always read the file but discard the result if the hash check failed.

If you do read the file in two passes, you could of course buffer the entire file data in a byte buffer in memory. However, the operating system will typically cache the file you just read in RAM anyway (if possible), so the performance benefit of doing a buffered two-pass solution yourself rather than just doing two passes over the file is probably negligible.

like image 123
Josef Grahn Avatar answered Mar 12 '23 12:03

Josef Grahn