EDIT: Based on additional experimentation, I'm fairly confident the slow-down occurs in response to many calls to the file open and close routines (h5open
and close
). I'm a bit short on time right now, but will come back and add more code/detail in the next few days.
Using the HDF5 package for Julia, I've noticed that the h5write
function starts to slow down if one performs many iterations over calls to h5write
and h5read
. Interestingly, it appears that for the behaviour to be really obvious, one should be reading and writing to a large number of locations in a small number of files. A demonstration of the behaviour I'm talking about can be obtained by starting a Julia session and running the following subroutine (note, you will need the HDF5 package):
#Set parameters
numFile = 10;
numLocation = 10000;
writeDir = "/home/colin/Temp/";
FloatOut = 5.5;
#Import functions
using HDF5
#Loop over read/writes.
c1 = 1;
timeMat = Array(Float64, numFile * 2, 2);
for i in 1:numFile
filePath = string(writeDir, "f", string(i), ".h5");
for j in 1:numLocation
location = string("G1/L", string(j));
if j == 1 || j == numLocation; tic(); end;
h5write(filePath, location, FloatOut);
if j == 1 || j == numLocation; timeMat[c1, 1] = toc(); end;
if j == 1 || j == numLocation; tic(); end;
FloatIn = h5read(filePath, location);
if j == 1 || j == numLocation; timeMat[c1, 2] = toc(); end;
if j == 1 || j == numLocation; c1 = c1+1; end;
end
rm(filePath);
end
This code writes the floating point number 5.5
(chosen for no particular reason) to 10,000
locations in each of 10
files using h5write
. Immediately after performing the write operation each time, the number is then read back in using h5read
. For each file, I store the time taken to perform the write and read operation to the first and last location for each file in timeMat
(note: initially, I stored the timing for every call but this level of detail is unnecessary to demonstrate the anomaly for the purposes of this question). The times are printed below:
h5write h5read
0.0007 0.0004
0.0020 0.0004
0.0020 0.0004
0.0031 0.0004
0.0034 0.0004
0.0049 0.0004
0.0050 0.0004
0.0064 0.0005
0.0068 0.0004
0.0082 0.0004
0.0084 0.0005
0.0106 0.0005
0.0114 0.0005
0.0114 0.0005
0.0120 0.0005
0.0131 0.0005
0.0135 0.0005
0.0146 0.0005
0.0151 0.0005
0.0163 0.0005
The timings for h5read
are fairly consistent across the subroutine. However, the timings for h5write
gradually get slower. By the end, a write is taking an order of magnitude longer than at the start. I understand that for each file, as we increase the number of locations, the time for a write (and read) might get slightly slower. But in this case, the slower performance persists even after we begin a new file. Perhaps strangest of all, we can run the subroutine a second time and time taken for a write will pick up where we left off on the previous run. The only way to get the time taken for a write back to the fastest speed is to completely restart Julia.
Final disclaimer: I am brand new to both Julia and hdf5, so I may have done something stupid or be overlooking something obvious.
The slowdown is indeed curious; profiling shows that almost all the time is spent in the close
function, which is basically just a ccall
. This suggests it may be a problem with the HDF5 C-library itself.
I think you'll be rather happier with the performance if you don't open and close the file each time you write a variable; instead, access the file through a file object. Here's an example:
filePath = string(writeDir, "f", string(i), ".h5")
h5open(filePath, "w") do file
global c1
for j in 1:numLocation
...
write(file, location, FloatOut)
...
FloatIn = read(file, location)
...
end
end
This way you're leaving the file open throughout the test. On my machine this is something like 100x faster.
If you want to pursue this further, please submit an issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With