Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia: fast file write on the fly

I am coding a solver which needs to write to file few numbers at each time step. The time step must be small, thus I need to write the output often..

This picture shows the code profiling. As you can see, the highlighted IO section takes a conspicuous part of the execution time.

The IO is done as

println(out_file, t, " ", v.P[1], " ", v.P[end])

where I want to save the first and last element of the vector P inside the data structure v as well as the value of t.

From the profiling seems that the most of the computational time is taken by the string.jl function (which is not defined by me).

This make me wonder whether there is more efficient way to write iteratively the output to file.

Any suggestion?

Thanks

Additional info

The output file is opened once at the beginning of the execution and left open until the end. I cannot report the entire code as it is very long, but it is something as

out_file = open("file.out", "w")

delta_t = computeDeltaT()
t = 0
while t<T
  P = computeP()

  println(out_file, t, " ", P[1], " ", P[end])

  delta_t = computeDeltaT()
  t += delta_t
end

close(out_file)

I need to write iteratively because the solution develops in time and I do not know how delta_t will change. So I cannot pre-allocate P. Also, it would be a huge matrix, something like millions by 5.

EDIT

@isebarn by printing every 100 steps indeed reduces the execution time. Also I'll try to add a second worker to handle the IO so I will not lose data.

like image 302
melisale Avatar asked Oct 19 '22 02:10

melisale


1 Answers

By iteratively do you mean another application/program must be able to read the file in between writes? Otherwise you can just open the stream once then close at the end.

f = open(outfile,"w") # do this once
for i in someloop
    # do something
    write(f, "whatever") # write to stream but not flushed to disk
end
close(f) # now everything is flushed to the disk (i.e. now outfile will have changed)

If you need to access the file during the process then you can open/close during every iteration (maybe write is faster than println, profile it to check) or you could just open/close the stream every N iterations to balance the two?

Edit: Source: http://docs.julialang.org/en/release-0.4/manual/networking-and-streams/

Like @isebarn said writing binary to hdf5 may also be faster. Not sure though.

Also also IO is quite often a limiting factor in these sorts of scenarios. The other thing to try is if there is a way to estimate P you could pre-allocate and then trim it?

like image 194
Alexander Morley Avatar answered Oct 29 '22 12:10

Alexander Morley