There's something really weird going on: strcmp() returns -1 though both strings are exactly the same. Here is a snippet from the output of the debugger (gdb):
(gdb) print s[i][0] == grammar->symbols_from_int[107][0]
$36 = true
(gdb) print s[i][1] == grammar->symbols_from_int[107][1]
$37 = true
(gdb) print s[i][2] == grammar->symbols_from_int[107][2]
$38 = true
(gdb) print s[i][3] == grammar->symbols_from_int[107][3]
$39 = true
(gdb) print s[i][4] == grammar->symbols_from_int[107][4]
$40 = true
(gdb) print s[i][5] == grammar->symbols_from_int[107][5]
$41 = false
(gdb) print grammar->symbols_from_int[107][4]
$42 = 0 '\0'
(gdb) print s[i]
$43 = (char * const&) @0x202dc50: 0x202d730 "Does"
(gdb) print grammar->symbols_from_int[107]
$44 = (char * const&) @0x1c9fb08: 0x1c9a062 "Does"
(gdb) print strcmp(s[i],grammar->symbols_from_int[107])
$45 = -1
Any idea what's going on?
Thanks in advance,
Onur
Edit 1: Here are some snippets of my code:
# include <unordered_map> // Used as hash table
# include <stdlib.h>
# include <string.h>
# include <stdio.h>
# include <vector>
using namespace std;
using std::unordered_map;
using std::hash;
struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};
...
<some other code>
...
class BPCFG {
public:
char *symbols; // Character array holding all grammar symbols, with NULL seperating them.
char *rules; // Character array holding all rules, with NULL seperating them.
unordered_map<char *, int , hash<char *> , eqstr> int_from_symbols; // Hash table holding the grammar symbols and their integer indices as key/value pairs.
...
<some other code>
...
vector<char *> symbols_from_int; // Hash table holding the integer indices and their corresponding grammar symbols as key/value pairs.
void load_symbols_from_file(const char *symbols_file);
}
void BPCFG::load_symbols_from_file(const char *symbols_file) {
char buffer[200];
FILE *input = fopen(symbols_file, "r");
int symbol_index = 0;
while(fscanf(input, "%s", buffer) > 0) {
if(buffer[0] == '/')
strcpy(symbols + symbol_index, buffer+1);
else
strcpy(symbols + symbol_index, buffer);
symbols_from_int.push_back(symbols + symbol_index);
int_from_symbols[symbols+symbol_index] = symbols_from_int.size()-1;
probs.push_back(vector<double>());
hyperprobs.push_back(vector<double>());
rules_from_IntPair.push_back(vector<char *>());
symbol_index += strlen(symbols+symbol_index) + 1;
}
fclose(input);
}
This last function (BPCFG::load_symbols_from_file) seems to be the only function I modify symbols_from_int in my whole code. Please tell me if you need some more code. I'm not putting everything because it's hundreds of lines.
Edit 2: OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:
BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
int_from_symbols(1.5*symbol_count),
IntPair_from_rules(1.5*rule_count),
symbol_after_dot(10*rule_count)
{
symbols = (char *)malloc(symbols_length*sizeof(char));
rules = (char *)malloc(rules_length*sizeof(char));
}
Edit 3: Here is the code on the path to the point of error. It's not compilable, but it shows where the code stepped through (I checked with next and step commands in the debugger that the code indeed follows this route):
BPCFG my_grammar(2000, 5500, 194, 187);
my_grammar.load_symbols_from_file("random_50_1_words_symbols.txt");
<some irrelevant code>
my_grammar.load_rules_from_file("random_50_1_words_grammar.txt", true);
<some irrelevant code>
my_grammar.load_symbols_after_dots();
BPCFGParser my_parser(&my_grammar);
BPCFGParser::Sentence s;
// (Sentence is defined in the BPCFGParser class with
// typedef vector<char *> Sentence;)
Edge e;
try {
my_parser.parse(s, e);
}
catch(char *e) {fprintf(stderr, "%s", e);}
void BPCFGParser::parse(const Sentence & s, Edge & goal_edge) {
/* Initializing the chart */
chart::active_sets.clear();
chart::passive_sets.clear();
chart::active_sets.resize(s.size());
chart::passive_sets.resize(s.size());
// initialize(sentence, goal);
try {
initialize(s, goal_edge);
}
catch (char *e) {
if(strcmp(e, UNKNOWN_WORD) == 0)
throw e;
}
<Does something more, but the execution does not come to this point>
}
void BPCFGParser::initialize(const Sentence & s, Edge & goal_edge) {
// create a new chart and new agendas
/* For now, we plan to do this during constructing the BPCFGParser object */
// for each word w:[start,end] in the sentence
// discoverEdge(w:[start,end])
Edge temp_edge;
for(int i = 0;i < s.size();i++) {
temp_edge.span.start = i;
temp_edge.span.end = i+1;
temp_edge.isActive = false;
/* Checking whether the given word is ever seen in the training corpus */
unordered_map<char *, int , hash<char *> , eqstr>::const_iterator it = grammar->int_from_symbols.find(s[i]);
if(it == grammar->int_from_symbols.end())
throw UNKNOWN_WORD;
<Does something more, but execution does not come to this point>
}
}
Where I run the print commands in the debugger is the last
throw UNKNOWN_WORD;
command. I mean, I was stepping with next on GDB and after seeing this line, I ran all these print commands.
Thank you for your interest,
Onur
OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:
BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
int_from_symbols(1.5*symbol_count),
IntPair_from_rules(1.5*rule_count),
symbol_after_dot(10*rule_count)
{
symbols = (char *)malloc(symbols_length*sizeof(char));
rules = (char *)malloc(rules_length*sizeof(char));
}
strcmp() in C/C++ This function is used to compare the string arguments. It compares strings lexicographically which means it compares both the strings character by character. It starts comparing the very first character of strings until the characters of both strings are equal or NULL character is found.
strcmp returns -1 (less than 0), 0 (equal) or 1 (greather than 0). One way to find this out is to google man strcmp .
strcmp compares two character strings ( str1 and str2 ) using the standard EBCDIC collating sequence. The return value has the same relationship to 0 as str1 has to str2 . If two strings are equal up to the point at which one terminates (that is, contains a null character), the longer string is considered greater.
This sounds like s
is a pointer to an array that was on the stack which is overwritten as soon as a new function is called, ie strcmp()
What does the debugger say they are after the strcmp()
call?
In recent Linux distributions strcmp is a symbol of type STT_GNU_IFUNC. This is not supported in the last release of GDB (7.2 at the time of writing). That may be the cause of your problem, although in your case the return value looks genuine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With