Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write operation cost

Tags:

performance

io

go

I have a Go program which writes strings into a file.I have a loop which is iterated 20000 times and in each iteration i am writing around 20-30 strings into a file. I just wanted to know which is the best way to write it into a file.

  • Approach 1: Keep open the file pointer at the start of the code and write it for every string. It makes it 20000*30 write operations.

  • Approach 2: Use bytes.Buffer Go and store everything in the buffer and write it at the end.Also in this case should the file pointer be opened from the beginning of the code or at the end of the code. Does it matter?

I am assuming approach 2 should work better. Can someone confirm this with a reason. How does writing at once be better than writing periodically. Because the file pointer will anyways be open. I am using f.WriteString(<string>) and buffer.WriteString(<some string>) buffer is of type bytes.Buffer and f is the file pointer open.

like image 388
KD157 Avatar asked Jan 06 '16 08:01

KD157


People also ask

What is write operation in Azure storage?

The following API calls are considered write operations: PutBlob, PutBlock, PutBlockList, AppendBlock, SnapshotBlob, CopyBlob, and SetBlobTier (when it moves a Blob from Hot to Cool, Cool to Archive, or Hot to Archive. )

How is blob storage billed?

Data storage and metadata are billed per GB on a monthly basis. For data and metadata stored for less than a month, you can estimate the impact on your monthly bill by calculating the cost of each GB per day.

What Is A blob storage?

Blob storage is a type of cloud storage for unstructured data. A "blob," which is short for Binary Large Object, is a mass of data in binary form that does not necessarily conform to any file format.

What is the cheapest Azure storage?

Azure Managed Disks The lowest cost disk is a P1 disk, which has a 4 GiB capacity and costs 60 cents per month, plus 3 cents per mount per month. The most expensive option is the P80 disk, which is 32 TiB and costs $3,276.80 per month, plus $219 per mount.


2 Answers

bufio package has been created exactly for this kind of task. Instead of making a syscall for each Write call bufio.Writer buffers up to a fixed number of bytes in the internal memory before making a syscall. After a syscall the internal buffer is reused for the next portion of data

Comparing to your second approach bufio.Writer

  • makes more syscalls (N/S instead of 1)
  • uses less memory (S bytes instead of N bytes)

where S - is buffer size (can be specified via bufio.NewWriterSize), N - total size of data that needs to be written.

Example usage (https://play.golang.org/p/AvBE1d6wpT):

f, err := os.Create("file.txt")
if err != nil {
    log.Fatal(err)
}
defer f.Close()

w := bufio.NewWriter(f)
fmt.Fprint(w, "Hello, ")
fmt.Fprint(w, "world!")
err = w.Flush() // Don't forget to flush!
if err != nil {
    log.Fatal(err)
}
like image 79
kostya Avatar answered Oct 16 '22 02:10

kostya


The operations that take time when writing in files are the syscalls and the disk I/O. The fact that the file pointer is open doesn't cost you anything. So naively, we could say that the second method is best.

Now, as you may know, you OS doesn't directly write into files, it uses an internal in-memory cache for files that are written and do the real I/O later. I don't know the exacts details of that, and generally speaking I don't need to.

What I would advise is a middle-ground solution: do a buffer for every loop iteration, and write this one N times. That way to cut a big part of the number of syscalls and (potentially) disk writes, but without consuming too much memory with the buffer (dependeing on the size of your strings, that my be a point to be taken into account).

I would suggest benchmarking for the best solution, but due to the caching done by the system, benchmarking disk I/O is a real nightmare.

like image 44
Elwinar Avatar answered Oct 16 '22 03:10

Elwinar