Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this method throw a Segmentation fault?

I am using the jsmn JSON parser (source code) to get some text from a JSON. jsmn stores the data in tokens, but the tokens do not hold any data, they just point to the token boundaries in JSON string instead. For example, jsmn will create tokens like:

  • Object [0..31]
  • String [3..7], String [12..16], String [20..23]
  • Number [27..29]

This method is used to retrieve the actual characters between those values (for string objects):

char* getTextFromJSON(const char *json)
{
    if (!json) return NULL;

    json_parser p;
    #define N_TOKENS 15  // this normally would be at the start of the file
    jsontok_t tokens[N_TOKENS];

    initJsonParser(&p);
    int err parseJson(&p, json, tokens, N_TOKENS);
    if (err) {
    fprintf(stdout, "Error parsing JSON: %d\n", err);
    return NULL;
    }
    for (int i = 0; i < N_TOKENS; ++i) {
        jsontok_t *key = &tokens[i];
        if (!memcmp("utterance", &json[key->start], (size_t) (key->end - key->start))) {
            ++key;
            return strndup(&json[key->start], (size_t)(key->end - key->start));
        }
    }
    return NULL;
}

Here are some JSON examples that would be thrown into the parser:

  • {"status":0,"id":"432eac38858968c108899cc6c3a4bade-1","hypotheses":[{"utterance":"test","confidence":0.84134156}]}
  • {"status":5,"id":"695118aaa3d01dc2ac4aa8054d1e5bb0-1","hypotheses":[]}

Upon passing the first example JSON to the method, I get the expected value of "test" returned from the method. However, upon passing the empty JSON to the method, I get a Segmentation fault on the 8th iteration of the for loop on the conditional if statement.

Any suggestions?

Here are the hex values:

key->start: 0x00000000
key->end - key->start: 0x00000046
key->start: 0x00000002
key->end - key->start: 0x00000006
key->start: 0x0000000A
key->end - key->start: 0x00000001
key->start: 0x0000000D
key->end - key->start: 0x00000002
key->start: 0x00000012
key->end - key->start: 0x00000022
key->start: 0x00000037
key->end - key->start: 0x0000000A
key->start: 0x00000043
key->end - key->start: 0x00000002
key->start: 0x3A7B3188
key->end - key->start: 0x7A0F0766
like image 800
syb0rg Avatar asked Aug 06 '13 15:08

syb0rg


2 Answers

EDIT After looking at the source code...

for (i = parser->toknext; i < num_tokens; i++) {
    jsmn_fill_token(&tokens[i], JSMN_PRIMITIVE, -1, -1);
}

It initializes all of the structures but ->start and->end will equal -1, which is why memcmp is failing.

for (int i = 0; i < N_TOKENS; ++i) {
    jsontok_t *key = &tokens[i];
    if (key->start == -1) return NULL;
    if (!memcmp("utterance", &json[key->start], (size_t) (key->end - key->start))) {
        ++key;
        return strndup(&json[key->start], (size_t)(key->end - key->start));
    }
}

Checking for a -1 value in ->start or ->end should be sufficient.

like image 116
Louis Ricci Avatar answered Sep 27 '22 20:09

Louis Ricci


Your tokens[] array is uninitialized before you pass it to parseJson(), so once you iterate beyond the last token (the seventh one in your second example) you're trying to run memcmp() on uninitialized nonsense address values. That's causing your seg fault. Initialize tokens[] to something and then check for that initialization value in the start/end fields during your for() loop.

For example, I'd probably initialize tokens[] to zero ( via memset(&tokens, 0, sizeof(tokens)); ) and during each iteration of the loop check for length zero ( key->end - key->start ) to see if the token is actually valid before passing it to memcmp(). Bail out of the loop with a break; if the token has length zero.

(Or, if a token can have a legitimate zero-length, use some other value.)

like image 27
Andrew Cottrell Avatar answered Sep 27 '22 21:09

Andrew Cottrell