Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is our MonoTouch app breaking in the garbage collector? It is not out of memory

We have a simple question, but the cause is complicated. We are experienced developers, and have done a lot of research into what may be causing it. We are hoping that MonoTouch developers can work with us to identify what appears to be a common problem that people are having and for which no solution appears to exist yet. We've been working on this for over two weeks, and not been able to resolve it.

The question is: Why is our MonoTouch app breaking in the garbage collector? It is not out of memory.

The situation is that we have an app that checks a web service regularly (perhaps every 5 seconds). After a period of time it fails with a memory management abort. This typically happens after about an hour and a half, but can be anywhere from ten minutes to overnight. This happens on all of our test devices (we have 7 in total covering iOS3 and iOS4, iPod Touch, iPhones and iPads (1&2). After looking on StackOverflow, we have added a System.Gc.Collect in a timer before we take any action. This improved things a little (it takes longer to fail), but it did not go away. It is also worth adding that the memory log from the iPad shows that there are 777 free blocks, and 2041 in use by our app, with a total of 26488 wired pages. Since we've garbage collected, and are not doing anything different to what we did 5 seconds before, it seems odd to run out of memory.

We upgraded to MonoTouch 4.0.1 but that has not fixed it.

StackOverflow questions that might be on the same issue, but not answering it: 5666905 / 4545383 / 5492469 / 5426733

The stack at failure on an iPad2 is below. The failure can happen in the main thread or an http thread, but always goes in this GC_ sequence. I have included the code for the memory manager GC_remap below, with discussion.

Thread 10 Crashed:
0   libsystem_kernel.dylib  0x34b4da1c __pthread_kill + 8
1   libsystem_c.dylib       0x3646a3b4 pthread_kill + 52
2   libsystem_c.dylib       0x36462bf8 abort + 72
3   MyApp                   0x004ca92c mono_handle_native_sigsegv (mini-exceptions.c:2249)
4   MyApp                   0x004f2208 sigabrt_signal_handler (mini-posix.c:195)
5   libsystem_c.dylib       0x36475728 _sigtramp + 36
6   libsystem_c.dylib       0x3646a3b4 pthread_kill + 52
7   libsystem_c.dylib       0x36462bf8 abort + 72
8   MyApp                   0x0061dc94 GC_remap (os_dep.c:2092)
9   MyApp                   0x00611678 GC_allochblk_nth (allchblk.c:730)
10  MyApp                   0x00611028 GC_allochblk (allchblk.c:561)
11  MyApp                   0x0061d0e0 GC_new_hblk (new_hblk.c:253)
12  MyApp                   0x006133d0 GC_allocobj (alloc.c:1116)
13  MyApp                   0x00617d30 GC_generic_malloc_inner (malloc.c:136)
14  MyApp                   0x00617f40 GC_generic_malloc (malloc.c:192)
15  MyApp                   0x00618264 GC_malloc_atomic (malloc.c:262)
16  MyApp                   0x005a46d4 mono_object_allocate_ptrfree (object.c:4221)
17  MyApp                   0x005a4aa0 mono_string_new_size (object.c:4848)
18  MyApp                   0x005c1b14 ves_icall_System_String_InternalAllocateStr (string-icalls.c:213)
19  MyApp                   0x002d34c4 wrapper_managed_to_native_string_InternalAllocateStr_int + 52
20  MyApp                   0x002cff5c string_ToLower_System_Globalization_CultureInfo + 56
21  MyApp                   0x003e6ac0 System_Net_WebRequest_GetCreator_string + 40
22  MyApp                   0x003e694c System_Net_WebRequest_Create_System_Uri + 48
23  MyApp                   0x003e68d8 System_Net_WebRequest_Create_string + 64
24  MyApp                   0x004489c4 MyApp_Services_Client_GetResponseContent_string + 152
25  MyApp                   0x00446288 MyApp_Services_Client_GetCurrentQuestion_long_long + 916
26  MyApp                   0x00196fcc MyApp_Iphone_RootViewController_RetrieveCurrentQuestion + 868
27  MyApp                   0x002e6368 System_Threading_Thread_StartUnsafe + 168
28  MyApp                   0x00306890 wrapper_runtime_invoke_object_runtime_invoke_dynamic_intptr_intptr_intptr_intptr + 192
29  MyApp                   0x004b0274 mono_jit_runtime_invoke (mini.c:5746)
30  MyApp                   0x0059f924 mono_runtime_invoke (object.c:2756)
31  MyApp                   0x005a1350 mono_runtime_delegate_invoke (object.c:3421)
32  MyApp                   0x005ca884 start_wrapper_internal (threads.c:788)
33  MyApp                   0x005ca924 start_wrapper (threads.c:830)
34  MyApp                   0x005ef4b8 thread_start_routine (wthreads.c:285)
35  MyApp                   0x0061f1d0 GC_start_routine (pthread_support.c:1468)
36  libsystem_c.dylib       0x3646a30a _pthread_start + 242
37  libsystem_c.dylib       0x3646bbb4 thread_start + 0

This is the GC_remap code that appears to be the point of failure, from https://github.com/mono/mono/blob/master/libgc/os_dep.c

#ifdef NACL
      {
    /* NaCl doesn't expose mprotect, but mmap should work fine */
    void * mmap_result;
        mmap_result = mmap(start_addr, len, PROT_READ | PROT_WRITE | OPT_PROT_EXEC,
              MAP_PRIVATE | MAP_FIXED | OPT_MAP_ANON,
              zero_fd, 0/* offset */);
        if (mmap_result != (void *)start_addr) ABORT("mmap as mprotect failed");
        /* Fake the return value as if mprotect succeeded. */
        result = 0;
      }
#else /* NACL */
      result = mprotect(start_addr, len,
                PROT_READ | PROT_WRITE | OPT_PROT_EXEC);
#endif /* NACL */
      if (result != 0) {
      GC_err_printf3(
        "Mprotect failed at 0x%lx (length %ld) with errno %ld\n",
            start_addr, len, errno);
      ABORT("Mprotect remapping failed");
      }
      GC_unmapped_bytes -= len;

It would appear that the ABORT is caused by the mprotect function failing. We have been unable to get the failure code as the problem does not manifest itself on the simulator. The mprotect function appears to just mark the memory as accessible for read/write/execute. How is the memory manager passing parameters that cause it to fail? Could it be passing an incorrect pointer, or an incorrect length? Or are certain areas or boundaries handled differently on iOS?

The code at https://github.com/mono/mono/blob/master/libgc/allchblk.c for GC_allochblk_nth implies that the GC_remap function is only called if the memory block found was valid. (This file doesn't quite match the line numbers of the stack trace, so presumably it is not exactly the same file.)

http://developer.apple.com/library/ios/#documentation/System/Conceptual/ManPages_iPhoneOS/man2/mprotect.2.html says that it might fail with EACCES, EINVAL, ENOTSUP which are 13, 22, & 45 respectively. One of the reports on SO says that they get an error 12 (ENOMEM). I'm not sure what that means, as mprotect shouldn't be allocating memory, and the documentation doesn't say that is valid.

A more generic documentation at http://linux.die.net/man/2/mprotect indicates that ENOMEM can be caused by "Internal kernel structures could not be allocated. Or: addresses in the range [addr, addr+len] are invalid for the address space of the process, or specify one or more pages that are not mapped." How could this be?

We would much appreciate any suggestions on how we might move this forward. We are not doing anything other than C# code, and are not doing anything other than a periodic https read. What can we do to improve debugging (we can't trace anything as the app is killed by iOS). We have tried creating a simpler demonstration, but it does not fail fast enough to be worth using. If a Novell MonoTouch developer wants our source, we can provide it subject to the obvious confidentiality.

like image 973
mj2008 Avatar asked Apr 28 '11 13:04

mj2008


1 Answers

Thanks to your reproduction we have found and corrected a very obscure issue in the garbage collector. It will be included in MonoTouch 4.0.2.

like image 150
Geoff Norton Avatar answered Oct 16 '22 14:10

Geoff Norton