Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded C Lua module leading to segfault in Lua script


I've written a very simple C library for Lua, which consists of a single function that starts a thread, with said thread doing nothing but looping :

#include "lua.h"
#include "lauxlib.h"
#include <pthread.h>
#include <stdio.h>

pthread_t handle;
void* mythread(void* args)
{
    printf("In the thread !\n");
    while(1);
    pthread_exit(NULL);
}

int start_mythread()
{
    return pthread_create(&handle, NULL, mythread, NULL);
}

int start_mythread_lua(lua_State* L)
{
    lua_pushnumber(L, start_mythread());
    return 1;
}

static const luaL_Reg testlib[] = {
    {"start_mythread", start_mythread_lua},
    {NULL, NULL}
};

int luaopen_test(lua_State* L)
{
/*
    //for lua 5.2
    luaL_newlib(L, testlib);
    lua_setglobal(L, "test");
*/
    luaL_register(L, "test", testlib);
    return 1;
}


Now, if I write a very simple Lua script that just does :

require("test")
test.start_mythread()

Running the script with lua myscript.lua will sometimes cause a segfault. Here's what GDB has to say about the core dump :

Program terminated with signal 11, Segmentation fault.
#0  0xb778b75c in ?? ()
(gdb) thread apply all bt

Thread 2 (Thread 0xb751c940 (LWP 29078)):
#0  0xb75b3715 in _int_free () at malloc.c:4087
#1  0x08058ab9 in l_alloc ()
#2  0x080513a2 in luaM_realloc_ ()
#3  0x0805047b in sweeplist ()
#4  0x080510ef in luaC_freeall ()
#5  0x080545db in close_state ()
#6  0x0804acba in main () at lua.c:389

Thread 1 (Thread 0xb74efb40 (LWP 29080)):
#0  0xb778b75c in ?? ()
#1  0xb74f6efb in start_thread () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#2  0xb7629dfe in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129

With a few variations in the stack of the main thread from time to time.
It seems the start_thread function wants to jump to a given address (in this instance, b778b75c) that sometimes happens to belong to unreachable memory.
Edit
I also have a valgrind output :

==642== Memcheck, a memory error detector
==642== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==642== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==642== Command: lua5.1 go.lua
==642== 
In the thread !
In the thread !
==642== Thread 2:
==642== Jump to the invalid address stated on the next line
==642==    at 0x403677C: ???
==642==    by 0x46BEEFA: start_thread (pthread_create.c:309)
==642==    by 0x41C1DFD: clone (clone.S:129)
==642==  Address 0x403677c is not stack'd, malloc'd or (recently) free'd
==642== 
==642== 
==642== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==642==  Access not within mapped region at address 0x403677C
==642==    at 0x403677C: ???
==642==    by 0x46BEEFA: start_thread (pthread_create.c:309)
==642==    by 0x41C1DFD: clone (clone.S:129)
==642==  If you believe this happened as a result of a stack
==642==  overflow in your program's main thread (unlikely but
==642==  possible), you can try to increase the size of the
==642==  main thread stack using the --main-stacksize= flag.
==642==  The main thread stack size used in this run was 8388608.
==642== 
==642== HEAP SUMMARY:
==642==     in use at exit: 1,296 bytes in 6 blocks
==642==   total heap usage: 515 allocs, 509 frees, 31,750 bytes allocated
==642== 
==642== LEAK SUMMARY:
==642==    definitely lost: 0 bytes in 0 blocks
==642==    indirectly lost: 0 bytes in 0 blocks
==642==      possibly lost: 136 bytes in 1 blocks
==642==    still reachable: 1,160 bytes in 5 blocks
==642==         suppressed: 0 bytes in 0 blocks
==642== Rerun with --leak-check=full to see details of leaked memory
==642== 
==642== For counts of detected and suppressed errors, rerun with: -v
==642== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Killed


However, I've been fine so far just opening the lua interpreter and entering the same instructions manually one after the other.
Also, a C program that does the same thing, using the same lib :

int start_mythread();

int main()
{
    int ret = start_mythread();
    return ret;
}

As it should, has never failed during my tests.
I've tried with both Lua 5.1 and 5.2, to no avail.
Edit: I should point out I tested this on a single-core eeePC running 32-bit Debian Wheezy (Linux 3.2).
I've just tested again on my main machine (4-core 64-bit Arch linux), and launching the script with lua myscript.lua segfaults every time there... Entering the commands from the interpreter prompt works fine though, as well as the C program above.

The reason I've written this small lib in the first place is because I'm writing a bigger library, with which I've first had this problem. After hours of unfruitful debugging, including removing every shared structures/variables one by one (yes, I was that desperate), I've come down to this piece of code.
So, my guess is there's something that I'm doing wrong with Lua, but what could that be ? I've searched this issue as much as I could, but what I found was mostly people having problems with using the Lua API from several threads (which isn't what I'm trying to do here).
If you have an idea, any help would be much appreciated.

Edit
To be more precise, I'd like to know if I should take extra precautions with threads when writing a C lib for use within Lua scripts. Does Lua need threads created from within a dynamically loaded library to be terminated when it "unloads" the library ?

like image 311
ranjak Avatar asked Feb 15 '15 15:02

ranjak


Video Answer


1 Answers

Why does the Segfault Happen in the Lua Module?

Your Lua script exits before the thread has finished which causes the segfault. The Lua module is unloaded using dlclose() during the normal interpreter shutdown, and so the thread's instructions are removed from memory, and it segfaults on reading its next instruction.

What are the options?

Any solution which stops the threads before the module is unloaded will work. Using pthread_join() in the main thread will wait for the threads to finish (you may want to kill long-running threads using pthread_cancel()). Calling pthread_exit() in the main thread before the module is unloaded will also prevent the crash (because it will prevent the dlclose()), but it also aborts the normal cleanup/shutdown procedure of the Lua interpreter.

Here are some examples that work:

int pexit(lua_State* L) {
   pthread_exit(NULL);
   return 0; 
} 

int join(lua_State* L)
{
  pthread_join(handle, NULL);
  return 0;
}

static const luaL_Reg testlib[] = {
    {"start_mythread", start_mythread_lua},
    {"join", join},
    {"exit", pexit},
    {NULL, NULL}
};

void* mythread(void* args) {
  int i, j, k;
    printf("In the thread !\n");
    for (i = 0; i < 10000; ++i) {
      for (j = 0; j < 10000; ++j) {
        for (k = 0; k < 10; ++k) {
          pow(1, i);
        }
      }
    }
    pthread_exit(NULL);
}

Now the script will exit nicely:

require('test')
test.start_mythread()
print("launched thread")
test.join() -- or test.exit()
print("thread joined")

To automate this, you can tie into the garbage collector since all the objects in module are freed before the unloading of the shared object. (as greatwolf suggested)

Discussion on calling pthread_exit() from main(): There is a definite problem if main() finishes before the threads it spawned if you don't call pthread_exit() explicitly. All of the threads it created will terminate because main() is done and no longer exists to support the threads. By having main() explicitly call pthread_exit() as the last thing it does, main() will block and be kept alive to support the threads it created until they are done.

(This quote is a bit misleading: Returning from main() is roughly equivalent to calling exit(), which will quit the process including all running threads. This may or may not be exactly the behavior you want. Calling pthread_exit() in the main thread on the other hand will quit the main thread but keep all other threads running until they stop on their own or somebody else kills them. Again, this may or may not be the behavior you want. There is no problem unless you choose the wrong option for your use case.)

like image 103
9 revs, 2 users 97% Avatar answered Oct 15 '22 04:10

9 revs, 2 users 97%