I have a Go program which writes strings into a file.I have a loop which is iterated 20000 times and in each iteration i am writing around 20-30 strings into a file. I just wanted to know which is the best way to write it into a file.
Approach 1: Keep open the file pointer at the start of the code and write it for every string. It makes it 20000*30 write operations.
Approach 2: Use bytes.Buffer Go and store everything in the buffer and write it at the end.Also in this case should the file pointer be opened from the beginning of the code or at the end of the code. Does it matter?
I am assuming approach 2 should work better. Can someone confirm this with a reason. How does writing at once be better than writing periodically. Because the file pointer will anyways be open.
I am using f.WriteString(<string>)
and buffer.WriteString(<some string>)
buffer is of type bytes.Buffer
and f
is the file pointer open.
The following API calls are considered write operations: PutBlob, PutBlock, PutBlockList, AppendBlock, SnapshotBlob, CopyBlob, and SetBlobTier (when it moves a Blob from Hot to Cool, Cool to Archive, or Hot to Archive. )
Data storage and metadata are billed per GB on a monthly basis. For data and metadata stored for less than a month, you can estimate the impact on your monthly bill by calculating the cost of each GB per day.
Blob storage is a type of cloud storage for unstructured data. A "blob," which is short for Binary Large Object, is a mass of data in binary form that does not necessarily conform to any file format.
Azure Managed Disks The lowest cost disk is a P1 disk, which has a 4 GiB capacity and costs 60 cents per month, plus 3 cents per mount per month. The most expensive option is the P80 disk, which is 32 TiB and costs $3,276.80 per month, plus $219 per mount.
bufio package has been created exactly for this kind of task. Instead of making a syscall for each Write call bufio.Writer
buffers up to a fixed number of bytes in the internal memory before making a syscall. After a syscall the internal buffer is reused for the next portion of data
Comparing to your second approach bufio.Writer
N/S
instead of 1
)S
bytes instead of N
bytes)where S
- is buffer size (can be specified via bufio.NewWriterSize
), N
- total size of data that needs to be written.
Example usage (https://play.golang.org/p/AvBE1d6wpT):
f, err := os.Create("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
w := bufio.NewWriter(f)
fmt.Fprint(w, "Hello, ")
fmt.Fprint(w, "world!")
err = w.Flush() // Don't forget to flush!
if err != nil {
log.Fatal(err)
}
The operations that take time when writing in files are the syscalls and the disk I/O. The fact that the file pointer is open doesn't cost you anything. So naively, we could say that the second method is best.
Now, as you may know, you OS doesn't directly write into files, it uses an internal in-memory cache for files that are written and do the real I/O later. I don't know the exacts details of that, and generally speaking I don't need to.
What I would advise is a middle-ground solution: do a buffer for every loop iteration, and write this one N times. That way to cut a big part of the number of syscalls and (potentially) disk writes, but without consuming too much memory with the buffer (dependeing on the size of your strings, that my be a point to be taken into account).
I would suggest benchmarking for the best solution, but due to the caching done by the system, benchmarking disk I/O is a real nightmare.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With