Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

most efficient way to write data into a file

Tags:

io

ruby

I want to write 2TB data into one file, in the future it might be a petabyte.

The data is composed of all '1'. For example, 2TB data consisting of "1111111111111......11111" (each byte is represented by '1').

Following is my way:

File.open("data",File::RDWR||File::CREAT) do |file|
  2*1024*1024*1024*1024.times do
  file.write('1')
  end
end

That means, File.write is called 2TB times. From the point of Ruby, is there a better way to implement it?

like image 446
SecureFish Avatar asked Aug 08 '12 20:08

SecureFish


People also ask

Which is the best method to write large amount of data to a file?

Using FileChannel. Next, we will cover an example of using Java FileChannels to transfer a very large amount of data from one file to other. Here, we are using a buffer of (4 * 1024) size. From the output it is clear that, this is so far the fastest and most memory efficient way of processing large files.

How do you write the data into the files?

Data is written to a file using the PRINTF statement. The statement may include the FORMAT keyword to control the specific structure of the written file. Format rules are needed when writing an array to a file. Writing data to a file using simple format rules in the PRINTF procedure.

Which is used to write a data into file?

Java FileWriter class in java is used to write character-oriented data to a file as this class is character-oriented class because of what it is used in file handling in java.

What is the best way to write to a file in Java?

FileWriter: FileWriter is the simplest way to write a file in Java. It provides overloaded write method to write int, byte array, and String to the File. You can also write part of the String or byte array using FileWriter. FileWriter writes directly into Files and should be used only when the number of writes is less.


2 Answers

You have a few problems:

  1. File::RDWR||File::CREAT always evaluates to File::RDWR. You mean File::RDWR|File::CREAT (| rather than ||).

  2. 2*1024*1024*1024*1024.times do runs the loop 1024 times then multiplies the result of the loop by the stuff on the left. You mean (2*1024*1024*1024*1024).times do.

Regarding your question, I get significant speedup by writing 1024 bytes at a time:

File.open("data",File::RDWR|File::CREAT) do |file|
  buf = "1" * 1024
  (2*1024*1024*1024).times do
    file.write(buf)
  end
end

You might experiment and find a better buffer size than 1024.

like image 197
Darshan Rivka Whittle Avatar answered Oct 11 '22 14:10

Darshan Rivka Whittle


Don't know which OS you are using but the fastest approach would be to us a system copy to concatenate files to one big file, you can script that. An example. If you start with a string like "1" and echo it to a file

echo "1" > file1

you can concatenate this file with itself a number of time to a new file, in windows you have to use the parameter /b for binary copy to do that.

copy /b file1+file1 file2

gives you a file2 of 12 bytes (including the CR)

copy file2+file2 file1

gives you 24 bytes etc

I will let the math (and the fun of Rubying this) to you but you will reach your size quick enough and probably faster than the accepted answer.

like image 25
peter Avatar answered Oct 11 '22 13:10

peter