Hello, I have Fortran code for reading in ASCII double precision data (example of data file at bottom of question):
program ReadData
integer :: mx,my,mz
doubleprecision, allocatable, dimension(:,:,:) :: charge
! Open the file 'CHGCAR'
open(11,file='CHGCAR',status='old')
! Get the extent of the 3D system and allocate the 3D array
read(11,*)mx,my,mz
allocate(charge(mx,my,mz) )
! Bulk read the entire block of ASCII data for the system
read(11,*) charge
end program ReadData
and the "equivalent" C++ code:
#include <fstream>
#include <vector>
using std::ifstream;
using std::vector;
using std::ios;
int main(){
int mx, my, mz;
// Open the file 'CHGCAR'
ifstream InFile('CHGCAR', ios::in);
// Get the extent of the 3D system and allocate the 3D array
InFile >> mx >> my >> mz;
vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));
// Method 1: std::ifstream extraction operator to double
for (int i = 0; i < mx; ++i)
for (int j = 0; j < my; ++j)
for (int k = 0; k < mz; ++k)
InFile >> charge[i][j][k];
return 0;
}
Note that the line
read(11,*) charge
performs the same task as the C++ code:
for (int i = 0; i < mx; ++i)
for (int j = 0; j < my; ++j)
for (int k = 0; k < mz; ++k)
InFile >> charge[i][j][k];
where InFile
is an if stream
object (note that while iterators in the Fortran code start at 1 and not 0, the range is the same).
However, the Fortran code runs way, way faster than the C++ code, I think because Fortran does something clever like reading/parsing the file according to the range and shape (values of mx
, my
, mz
) all in one go, and then simply pointing charge
to the memory the data was read to. The C++ code, by comparison, needs to access InFile
and then charge
(which is typically large) back and forth with each iteration, resulting in (I believe) many more IO and memory operations.
I'm reading in potentially billions of of values (several gigabytes), so I really want to maximize performance.
How can I achieve the performance of the Fortran code in C++?
Here is a much faster (than the above C++) C++ implementation, where the file is read in one go into a char
array, and then charge
is populated as the char
array is parsed:
#include <fstream>
#include <vector>
#include <cstdlib>
using std::ifstream;
using std::vector;
using std::ios;
int main(){
int mx, my, mz;
// Open the file 'CHGCAR'
ifstream InFile('CHGCAR', ios::in);
// Get the extent of the 3D system and allocate the 3D array
InFile >> mx >> my >> mz;
vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));
// Method 2: big char array with strtok() and atof()
// Get file size
InFile.seekg(0, InFile.end);
int FileSize = InFile.tellg();
InFile.seekg(0, InFile.beg);
// Read in entire file to FileData
vector<char> FileData(FileSize);
InFile.read(FileData.data(), FileSize);
InFile.close();
/*
* Now simply parse through the char array, saving each
* value to its place in the array of charge density
*/
char* TmpCStr = strtok(FileData.data(), " \n");
// Gets TmpCStr to the first data value
for (int i = 0; i < 3 && TmpCStr != NULL; ++i)
TmpCStr = strtok(NULL, " \n");
for (int i = 0; i < Mz; ++i)
for (int j = 0; j < My; ++j)
for (int k = 0; k < Mx && TmpCStr != NULL; ++k){
Charge[i][j][k] = atof(TmpCStr);
TmpCStr = strtok(NULL, " \n");
}
return 0;
}
Again, this is much faster than the simple >>
operator-based method, but still considerably slower than the Fortran version--not to mention much more code.
I'm sure that method 2 is the way to go if I am to implement it myself, but I'm curious how I can increase performance to match the Fortran code. The types of things I'm considering and currently researching are:
strtok()
char
to double
conversion than atof()
In particular, the C++ String Toolkit Library will take FileData
and the delimiters " \n"
and give me a string token object (call it FileTokens
, then the triple for
loop would look like
for (int k = 0; k < Mz; ++k)
for (int j = 0; j < My; ++j)
for (int i = 0; i < Mx; ++i)
Charge[k][j][i] = FileTokens.nextFloatToken();
This would simplify the code slightly, but there is extra work in copying (in essence) the contents of FileData
into FileTokens
, which might kill any performance gains from using the nextFloatToken()
method (presumedly more efficient than the strtok()
/atof()
combination).
There is an example on the C++ String Toolkit (StrTk) Tokenizer tutorial page (included at the bottom of the question) using StrTk's for_each_line()
processor that looks to be similar to my desired application. A difference between the cases, however, is that I cannot assume how many data will appear on each line of the input file, and I do not know enough about StrTk to say if this is a viable solution.
The topic of fast reading of ASCII data to an array or struct has come up before, but I have reviewed the following posts and their solutions were not sufficient:
Here is an example of the data file I'm importing. The ASCII data is delimited by spaces and line breaks like the below example:
5 3 3
0.23080516813E+04 0.22712439791E+04 0.21616898980E+04 0.19829996749E+04 0.17438686650E+04
0.14601734127E+04 0.11551623512E+04 0.85678544224E+03 0.59238325489E+03 0.38232265554E+03
0.23514479113E+03 0.14651943589E+03 0.10252743482E+03 0.85927499703E+02 0.86525872161E+02
0.10141182750E+03 0.13113419142E+03 0.18057147781E+03 0.25973252462E+03 0.38303754418E+03
0.57142097675E+03 0.85963728360E+03 0.12548019843E+04 0.17106124085E+04 0.21415379433E+04
0.24687336309E+04 0.26588012477E+04 0.27189091499E+04 0.26588012477E+04 0.24687336309E+04
0.21415379433E+04 0.17106124085E+04 0.12548019843E+04 0.85963728360E+03 0.57142097675E+03
0.38303754418E+03 0.25973252462E+03 0.18057147781E+03 0.13113419142E+03 0.10141182750E+03
0.86525872161E+02 0.85927499703E+02 0.10252743482E+03 0.14651943589E+03 0.23514479113E+03
Here is the StrTk example mentioned above. The scenario is parsing the data file that contains the information for a 3D mesh:
input data:
5
+1.0,+1.0,+1.0
-1.0,+1.0,-1.0
-1.0,-1.0,+1.0
+1.0,-1.0,-1.0
+0.0,+0.0,+0.0
4
0,1,4
1,2,4
2,3,4
3,1,4
code:
struct point
{
double x,y,z;
};
struct triangle
{
std::size_t i0,i1,i2;
};
int main()
{
std::string mesh_file = "mesh.txt";
std::ifstream stream(mesh_file.c_str());
std::string s;
// Process points section
std::deque<point> points;
point p;
std::size_t point_count = 0;
strtk::parse_line(stream," ",point_count);
strtk::for_each_line_n(stream,
point_count,
[&points,&p](const std::string& line)
{
if (strtk::parse(line,",",p.x,p.y,p.z))
points.push_back(p);
});
// Process triangles section
std::deque<triangle> triangles;
triangle t;
std::size_t triangle_count = 0;
strtk::parse_line(stream," ",triangle_count);
strtk::for_each_line_n(stream,
triangle_count,
[&triangles,&t](const std::string& line)
{
if (strtk::parse(line,",",t.i0,t.i1,t.i2))
triangles.push_back(t);
});
return 0;
}
This...
vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));
...creates a temporary vector<double>(mz)
, with all 0.0 values, and copies it my
times (or perhaps moves then copies my-1
times with a C++11 compiler, but little difference...) to create a temporary vector<vector<double>>(my, ...)
which is then copied mx
times (...as above...) to initialise all the data. You're reading data in over these elements anyway - there's no need to spend time initialising it here. Instead, create an empty charge
and use nested loops to reserve()
enough memory for the elements without populating them yet.
Next, check you're compiling with optimisation on. If you are and you're still slower than FORTRAN, in the data-populating nested loops try creating a reference to the vector you're about .emplace_back
elements on to:
for (int i = 0; i < mx; ++i)
for (int j = 0; j < my; ++j)
{
std::vector<double>& v = charge[i][j];
for (int k = 0; k < mz; ++k)
{
double d;
InFile >> d;
v.emplace_pack(d);
}
}
That shouldn't help if your optimiser's done a good job, but is worth trying as a sanity check.
If you're still slower - or just want to try to be even faster - you could try optimising your number parsing: you say your data's all formatted ala 0.23080516813E+04
- with fixed sizes like that you can easily calculate how many bytes to read into a buffer to give you a decent number of values from memory, then for each you could start an atol
after the .
to extract 23080516813 then multiply it by 10 to the power of minus (11 (your number of digits) minus 04): for speed, keep a table of those powers of ten and index into it using the extracted exponent (i.e. 4). (Note multiplying by e.g. 1E-7 can be faster than dividing by 1E7 on a lot of common hardware.)
And if you want to blitz this thing, switch to using memory mapped file access. Worth considering boost::mapped_file_source
as it's easier to use than even the POSIX API (let alone Windows), and portable, but programming directly against an OS API shouldn't be much of a struggle either.
Example of using boost memory mapping:
#include <boost/iostreams/device/mapped_file.hpp>
boost::mapped_file_params params("dbldat.in");
boost::mapped_file_source file(params);
file.open();
ASSERT(file.is_open());
const char* p = file.data();
const char* nl = strchr(p, '\n');
std::istringstream iss(std::string(p, nl - p));
size_t x, y, z;
ASSERT(iss >> x >> y >> z);
The above maps a file into memory at address p
, then parses the dimensions from the first line. Continue parsing the actual double
representations from ++nl
onwards. I mention an approach to that above, and you're concerned about the data format changing: you could add a version number to the file, so you can use optimised parsing until the version number changes then fall back on something generic for "unknown" file formats. As far as something generic goes, for in-memory representations using int chars_to_skip; double my_double; ASSERT(sscanf(ptr, "%f%n", &my_double, &chars_to_skip) == 1);
is reasonable: see sscanf
docs here - you can then advance the pointer through the data by chars_to_skip
.
Next, are you suggesting to combine the
reserve()
solution with the reference creation solution?
Yes.
And (pardon my ignorance) why would using a reference to
charge[i][j]
andv.emplace_back()
be better thancharge[i][j].emplace_back()
?
That suggestion was to sanity check that the compiler's not repeatedly evaluating charge[i][j]
for each element being emplaced: hopefully it will make no performance difference and you can go back to the charge[i][j].emplace()
, but IMHO it's worth a quick check.
Lastly, I'm skeptical about using an empty vector and reserve()ing at the tops of each loop. I have another program that came to a grinding halt using that method, and replacing the reserve()s with a preallocated multidimensional vector sped it up a lot.
That's possible, but not necessarily true in general or applicable here - a lot depends on the compiler / optimiser (particularly loop unrolling) etc.. With unoptimised emplace_back
you're having to check vector size()
against capacity()
repeatedly, but if the optimiser does a good job that should be reduced to insignificance. As with a lot of performance tuning, you often can't reason about things perfectly and conclude what's going to be fastest, and will have to try alternatives and measure them with your actual compiler, program data etc..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With