I've got the following code below which parses a text file and indexes the words and lines:
bool Database::addFromFileToListAndIndex(string path, BSTIndex* & index, list<Line *> & myList)
{
bool result = false;
ifstream txtFile;
txtFile.open(path, ifstream::in);
char line[200];
Line * ln;
//if path is valid AND is not already in the list then add it
if(txtFile.is_open() && (find(textFilePaths.begin(), textFilePaths.end(), path) == textFilePaths.end())) //the path is valid
{
//Add the path to the list of file paths
textFilePaths.push_back(path);
int lineNumber = 1;
while(!txtFile.eof())
{
txtFile.getline(line, 200);
ln = new Line(line, path, lineNumber);
if(ln->getLine() != "")
{
lineNumber++;
myList.push_back(ln);
vector<string> words = lineParser(ln);
for(unsigned int i = 0; i < words.size(); i++)
{
index->addWord(words[i], ln);
}
}
}
result = true;
}
return result;
}
My code works flawlessly and fairly quickly until I give it a HUGE text file. Then I get a stack overflow error from Visual Studio. When I switch to "Release" configuration the code runs without a hitch. Is there something wrong with my code or is there some kind of limitation when running the "Debug" configuration? Am I trying to do too much in one function? If so how can I break it up so it doesn't crash while debugging?
EDIT Per request, my implementation of addWord;
void BSTIndex::addWord(BSTIndexNode *& pCurrentRoot, string word, Line * pLine)
{
if(pCurrentRoot == NULL) //BST is empty
{
BSTIndexNode * nodeToAdd = new BSTIndexNode();
nodeToAdd->word = word;
nodeToAdd->pData = pLine;
pCurrentRoot = nodeToAdd;
return;
}
//BST not empty
if (word < (pCurrentRoot->word)) //Go left
{
addWord(pCurrentRoot->pLeft, word, pLine);
}
else //Go right
{
addWord(pCurrentRoot->pRight, word, pLine);
}
}
And lineParser:
vector<string> Database::lineParser(Line * ln) //Parses a line and returns a vector of the words it contains
{
vector<string> result;
string word;
string line = ln->getLine();
//Regular Expression, matches anything that is not a letter, number, whitespace, or apostrophe
tr1::regex regEx("[^A-Za-z0-9\\s\\']");
//Using regEx above, replaces all non matching characters with nothing, essentially removing them.
line = tr1::regex_replace(line, regEx, std::string(""));
istringstream iss(line);
while(iss >> word)
{
word = getLowercaseWord(word);
result.push_back(word);
}
return result;
}
A stack overflow indicates that you've run out of stack space (probably obvious, but just in case). Typical causes are non-terminating or excessive recursion, or very large stack object duplication. Funnily enough it might be either in this case.
It's likely that in Release your compiler is doing tail call optimization which inhibits stack overflow from excessive recursion.
It's also likely that in Release your compiler is optimizing the return copy of the vector from lineParser.
So you need to find out which condition is overflowing in Debug, I would start with the recursion as the most likely culprit, trying changing the string parameter type to a reference, ie.
void BSTIndex::addWord(BSTIndexNode *& pCurrentRoot, string & word, Line * pLine)
This should stop you from duplicating word object on each nested invocation of addWord.
Also consider adding a std::cout << "recursing addWord" << std::endl; type statement to addWord so that you can see how deep its going and if its terminating correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With