Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

double free or corruption when running multithreaded

I met a runtime error "double free or corruption" in my C++ program that calls a reliable library ANN and uses OpenMP to parallize a for loop.

*** glibc detected *** /home/tim/test/debug/test: double free or corruption (!prev): 0x0000000002527260 ***     

Does it mean that the memory at address 0x0000000002527260 is freed more than once?

The error happens at "_search_struct->annkSearch(queryPt, k_max, nnIdx, dists, _eps);" inside function classify_various_k(), which is in turn inside the OpenMP for-loop inside function tune_complexity().

Note that the error happens when there are more than one threads for OpenMP, and does not happen in single thread case. Not sure why.

Following is my code. If it is not enough for diagnose, just let me know. Thanks for your help!

  void KNNClassifier::train(int nb_examples, int dim, double **features, int * labels) {                         
      _nPts = nb_examples;  

      _labels = labels;  
      _dataPts = features;  

      setting_ANN(_dist_type,1);   

    delete _search_struct;  
    if(strcmp(_search_neighbors, "brutal") == 0) {                                                                 
      _search_struct = new ANNbruteForce(_dataPts, _nPts, dim);  
    }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
      _search_struct = new ANNkd_tree(_dataPts, _nPts, dim);  
      }  

  }  


      void KNNClassifier::classify_various_k(int dim, double *feature, int label, int *ks, double * errors, int nb_ks, int k_max) {            
        ANNpoint      queryPt = 0;                                                                                                                
        ANNidxArray   nnIdx = 0;                                                                                                         
        ANNdistArray  dists = 0;                                                                                                         

        queryPt = feature;     
        nnIdx = new ANNidx[k_max];                                                               
        dists = new ANNdist[k_max];                                                                                

        if(strcmp(_search_neighbors, "brutal") == 0) {                                                                               
          _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);    
        }else if(strcmp(_search_neighbors, "kdtree") == 0) {    
          _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps); // where error occurs    
        }    

        for (int j = 0; j < nb_ks; j++)    
        {    
          scalar_t result = 0.0;    
          for (int i = 0; i < ks[j]; i++) {                                                                                      
              result+=_labels[ nnIdx[i] ];    
          }    
          if (result*label<0) errors[j]++;    
        }    

        delete [] nnIdx;    
        delete [] dists;    

      }    

      void KNNClassifier::tune_complexity(int nb_examples, int dim, double **features, int *labels, int fold, char *method, int nb_examples_test, double **features_test, int *labels_test) {    
          int nb_try = (_k_max - _k_min) / scalar_t(_k_step);    
          scalar_t *error_validation = new scalar_t [nb_try];    
          int *ks = new int [nb_try];    

          for(int i=0; i < nb_try; i ++){    
            ks[i] = _k_min + _k_step * i;    
          }    

          if (strcmp(method, "ct")==0)                                                                                                                     
          {    

            train(nb_examples, dim, features, labels );// train once for all nb of nbs in ks                                                                                                

            for(int i=0; i < nb_try; i ++){    
              if (ks[i] > nb_examples){nb_try=i; break;}    
              error_validation[i] = 0;    
            }    

            int i = 0;    
      #pragma omp parallel shared(nb_examples_test, error_validation,features_test, labels_test, nb_try, ks) private(i)    
            {    
      #pragma omp for schedule(dynamic) nowait    
              for (i=0; i < nb_examples_test; i++)         
              {    
                classify_various_k(dim, features_test[i], labels_test[i], ks, error_validation, nb_try, ks[nb_try - 1]); // where error occurs    
              }    
            }    
            for (i=0; i < nb_try; i++)    
            {    
              error_validation[i]/=nb_examples_test;    
            }    
          }

          ......
     }

UPDATE:

Thanks! I am now trying to correct the conflict of writing to same memory problem in classify_various_k() by using "#pragma omp critical":

void KNNClassifier::classify_various_k(int dim, double *feature, int label, int *ks, double * errors, int nb_ks, int k_max) {   
  ANNpoint      queryPt = 0;    
  ANNidxArray   nnIdx = 0;      
  ANNdistArray  dists = 0;     

  queryPt = feature; //for (int i = 0; i < Vignette::size; i++){ queryPt[i] = vignette->content[i];}         
  nnIdx = new ANNidx[k_max];                
  dists = new ANNdist[k_max];               

  if(strcmp(_search_neighbors, "brutal") == 0) {// search  
    _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);  
  }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
    _search_struct->annkSearch(queryPt, k_max,  nnIdx, dists, _eps);  
  }  

  for (int j = 0; j < nb_ks; j++)  
  {  
    scalar_t result = 0.0;  
    for (int i = 0; i < ks[j]; i++) {          
        result+=_labels[ nnIdx[i] ];  // Program received signal SIGSEGV, Segmentation fault
    }  
    if (result*label<0)  
    {  
    #pragma omp critical  
    {  
      errors[j]++;  
    }  
    }  

  }  

  delete [] nnIdx;  
  delete [] dists;  

}

However, there is a new segment fault error at "result+=_labels[ nnIdx[i] ];". Some idea? Thanks!

like image 495
Tim Avatar asked Feb 02 '10 05:02

Tim


2 Answers

Okay, since you've stated that it works correctly on a single-thread case, then "normal" methods won't work. You need to do the following:

  • find all variables that are accessed in parallel
  • especially take a look at those that are modified
  • don't call delete on a shared resource
  • take a look at all library functions that operate on shared resources - check if they don't do allocation/deallocation

This is the list of candidates that are double deleted:

shared(nb_examples_test, error_validation,features_test, labels_test, nb_try, ks)

Also, this code might not be thread safe:

      for (int i = 0; i < ks[j]; i++) {
         result+=_labels[ nnIdx[i] ]; 
      }    
      if (result*label<0) errors[j]++;  

Because two or more processes may try to do a write to errors array.

And a big advice -- try not to access (especially modify!) anything while in the threaded mode, that is not a parameter to the function!

like image 107
Kornel Kisielewicz Avatar answered Nov 10 '22 06:11

Kornel Kisielewicz


I don't know if this is your problem, but:

void KNNClassifier::train(int nb_examples, int dim, double **features, int * labels) {
  ...
  delete _search_struct;
  if(strcmp(_search_neighbors, "brutal") == 0) {
    _search_struct = new ANNbruteForce(_dataPts, _nPts, dim);
  }else if(strcmp(_search_neighbors, "kdtree") == 0) {  
    _search_struct = new ANNkd_tree(_dataPts, _nPts, dim);
  }
}  

What happens if you don't fall into either the if or the else if clauses? You've deleted _search_struct and left it pointing to garbage. You should set it to NULL afterward.

If this isn't the problem, you could try replacing:

delete p;

with:

assert(p != NULL);
delete p;
p = NULL;

(or similarly for delete[] sites). (This probably would pose a problem for the first invocation of KNNClassifier::train, however.)

Also, obligatory: do you really need to do all of these manual allocations and deallocations? Why aren't you at least using std::vector instead of new[]/delete[] (which are almost always bad)?

like image 34
jamesdlin Avatar answered Nov 10 '22 04:11

jamesdlin