Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Segmentation fault when using regexec/strtok_r in C

I'm having problems in figuring out where and why I'm receiving a segmentation fault.

I'm writing a C code that prompts the user to input a regular expression and compile it and then enter a string with multiple sentences:

int main(void){

  char RegExp[50];
  regex_t CompiledRegExp;
  char *para;
  char delim[] = ".!?,";
  char *sentence;
  char *ptr1;

  printf("Enter regular expression: ");
  fgets(RegExp, 50, stdin);

if (regcomp(&CompiledRegExp,RegExp,REG_EXTENDED|REG_NOSUB) != 0) {                        

    printf("ERROR: Something wrong in the regular expression\n");                         

    exit(EXIT_FAILURE);                                                                   

  }

  printf("\nEnter string: ");

strtok_r is used to split the string with either of the following delimiters .,?! and then the resulting token (sentence) is used as the string parameter in the regexec function that searches it to see if the regular expression previously compiled is contained within the token:

if( fgets(para, 1000, stdin)){

    char *ptr = para;
    sentence = strtok_r(ptr, delim, &ptr1);

    while(sentence != NULL){

      printf("\n%s", sentence);

      if (regexec(&CompiledRegExp,sentence,(size_t)0,NULL,0) == 0) {
        printf("\nYes");
      } else {
        printf("\nNo");
      }
      ptr = ptr1;
      sentence = strtok_r(ptr, delim, &ptr1);

    }
  }
regfree(&CompiledRegExp);
}

It's probably a silly mistake I'm making but any help in locating the reasons of the segfaul would be greatly appreciated!

EDIT: Moved regfree to a more suitable location. However, segfault still occurring. I'm pretty sure It has something got to do with either how the regular expression is being read in or how it is being compared in regexec. Clueless, though.

like image 236
higz555 Avatar asked Apr 22 '16 22:04

higz555


2 Answers

Instead of this:

char *para;
fgets(para, 1000, stdin);

Write this:

char para[1000];
fgets(para, 1000, stdin);

In the first variant, para is a pointer that points somewhere in memory, and to this somewhere the user-entered string is written. Most probably, para points to some address that is invalid, crashing your program immediately.

like image 198
Roland Illig Avatar answered Nov 04 '22 10:11

Roland Illig


You called regfree inside the loop. The second time around the loop you call regexec on freed memory with undefined behavior.

like image 20
Joshua Avatar answered Nov 04 '22 10:11

Joshua