I have a text file to read and deal with with 20000 lines. In the text file I want to read the point coordinates and assign to DirectX to render.Snapshot of Text file
I have used std::ifstream, getline, stringstream to get the point coordinates. After building the win32 program then start running, it takes too long to read and store the point coordinates in the array. (5 mins to go through 20000 lines text file). Code as below:
struct PointCoord { std::string PtName; float PtX = 0.0; float PtY = 0.0;}
PointCoord *PointPtr = NULL;
PointCoord pointcoord;
std::ifstream File_read(FileNameTXT);
while (getline(File_read, TextHandler))
{
std::istringstream iss;
std::string skip;
if (TextHandler.find(" POINT ") != std::string::npos)
{
iss.str(TextHandler);
std::string TempX, TempY;
iss >> skip;
iss >> pointcoord.PtName;
//pointcoord pass value to PointCoord
iss >> TempX;
iss >> TempY;
pointcoord.PtX = std::stof(TempX.c_str());
pointcoord.PtY = std::stof(TempY.c_str());
//dynamically store the points coordiantes
if (PointPtr == NULL)
{
PointPtr = new PointCoord[1];
//PointCoord pass value to PointPtr
PointPtr[0] = pointcoord;
Pt_Count++;
}
else
{
PointCoord *Temp = PointPtr;
PointPtr = new PointCoord[Pt_Count + 1];
for (UINT i = 0; i < Pt_Count; i++)
{
PointPtr[i] = Temp[i];
}
PointPtr[Pt_Count] = pointcoord;
Pt_Count++;
delete[]Temp;
}
}//end of loading points
}//end of getline
I also used std::fread to read the whole text file at once in string buffer, which is fast(reading done in few seconds). Then use stringstream in similar code to store the point coordinates in dynamical array, which is also too slow.
Any suggestion are welcome. Many thanks.
The most damning thing in this code isn't the string parsing; its the resize of the target array for every new point read. The more you read, the worse it gets. Ultimately it becomes a copy operation on the order of O(n^2).
To give you an idea how bad that is, consider the basic summation of n
natural numbers, because that's how many object constructions, destructions, and copies you're doing:
n(n+1)/2 = (20000 * 20001)/2 = 200010000 objects created, copied, and destroyed
So, string parsing isn't the problem. The 20000 lines of text being parsed is dwarfed by the over-200-million object constructions, destructions, and copies.
You don't need to do any of that. Use an appropriate container such as std::vector
and approximate the initial reserve based on the file size. Then, just generate the points and move them into the container.
An example that does this, including generation of a test file of 100000 points (5x the size you're asking about), appears below:
Code
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>
#include <random>
#include <chrono>
struct Point
{
std::string name;
float x;
float y;
};
std::vector<Point> readFile(std::string const& fname)
{
std::vector<Point> res;
std::ifstream inp(fname);
if (inp.is_open())
{
// gather file size for a rough approximation of reserve
inp.seekg(0, std::ios::end);
auto flen = inp.tellg();
inp.seekg(0, std::ios::beg);
res.reserve(flen/40);
std::string line;
while (std::getline(inp, line))
{
auto pos = line.find("POINT");
if (pos != std::string::npos)
{
std::istringstream iss(line.substr(pos+5));
Point pt;
if (iss >> pt.name >> pt.x >> pt.y)
res.emplace_back(std::move(pt));
}
}
}
return res;
}
int main()
{
using namespace std::chrono;
std::mt19937 prng(std::random_device{}());
std::uniform_real_distribution<float> dist(-100.0, 100.0);
// generate test data
std::ofstream outf("testpoints.txt");
for (int i=1; i<=100000; ++i)
outf << "POINT \"" << i << "\" " << dist(prng) << ' ' << dist(prng) << '\n';
outf.close();
// rough benchmark
auto tp0 = steady_clock::now();
auto v = readFile("testpoints.txt");
auto tp1 = steady_clock::now();
std::cout << "Read " << v.size() << " points in ";
std::cout << duration_cast<milliseconds>(tp1-tp0).count() << "ms\n";
v.clear();
}
Output
Running on a 2015 duo-core i7 MacBook Air Laptop, a release mode build produces the following result:
Read 100000 points in 164ms
A Possibly More Appropriate Container: std::deque
In the end, what you're really needing is a container that allows quick-insertion on the end, while minimizing (or eliminating) element copying during resizing. Certainly, as the code above shows, setting up a reserve against a std::vector
is one way to do that. Another option is to use a container that actuall specializes in end-insertions, while still being mostly cache-friendly (not perfect like std::vector
, but certainly better than something like linked lists, which are great for insertions, but dreadful for enumeration).
That's exactly what std::deque
will do. The code above, changed for std::deque
, allows you to eliminate the reserve-guess, and simply start slamming nodes on the end of the sequence, which will automatically add more pages as the sequence grows:
Code
#include <iostream>
#include <fstream>
#include <sstream>
#include <deque>
#include <string>
#include <random>
#include <chrono>
struct Point
{
std::string name;
float x;
float y;
};
std::deque<Point> readFile(std::string const& fname)
{
std::deque<Point> res;
std::ifstream inp(fname);
if (inp.is_open())
{
std::string line;
while (std::getline(inp, line))
{
auto pos = line.find("POINT");
if (pos != std::string::npos)
{
std::istringstream iss(line.substr(pos+5));
Point pt;
if (iss >> pt.name >> pt.x >> pt.y)
res.emplace_back(std::move(pt));
}
}
}
return res;
}
int main()
{
using namespace std::chrono;
std::mt19937 prng(std::random_device{}());
std::uniform_real_distribution<float> dist(-100.0, 100.0);
// generate test data
std::ofstream outf("testpoints.txt");
for (int i=1; i<=100000; ++i)
outf << "POINT \"" << i << "\" " << dist(prng) << ' ' << dist(prng) << '\n';
outf.close();
// rough benchmark
auto tp0 = steady_clock::now();
auto v = readFile("testpoints.txt");
auto tp1 = steady_clock::now();
std::cout << "Read " << v.size() << " points in ";
std::cout << duration_cast<milliseconds>(tp1-tp0).count() << "ms\n";
v.clear();
}
Output
Read 100000 points in 160ms
If your needs require a contiguous sequence, the std::vector
approach is the way to go. If you just need random access to the elements and want fast end insertions, a std::deque
may be a better fit. Think about that, and choose the best fit for you.
Summary
Get rid of that dreadful expansion algorithm. It's the sore point in your code. Replace it with a geometric resize algorithm, and start with a rough approximation of the number of elements you're going to need from the beginning. Or use a container suited for optimal end-insertions. Either way, it's better than what you have now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With