Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling a shared library from Haskell via FFI blocks, while it doesn't when linked from a C program

Tags:

c++

c

haskell

ffi

I'm trying to interface with a Basler USB3 camera from a Haskell application, but I'm experiencing some difficulty. The camera comes with a C++ library that makes it fairly straight forward. The following code can be used to acquire a camera source:

extern "C" {
  void basler_init() {
    PylonAutoInitTerm pylon;
    CInstantCamera camera( CTlFactory::GetInstance().CreateFirstDevice());
    camera.RegisterConfiguration( (CConfigurationEventHandler*) NULL, RegistrationMode_ReplaceAll, Cleanup_None);
    cout << "Using device " << camera.GetDeviceInfo().GetModelName() << endl;
  }
}

I've used this source code to build a shared library - libbasler.so. To confirm it works, here's a basic C program that links against it and confirms everything is working:

void basler_init();

int main () {
  basler_init();
}

I compile and run this as:

$ gcc Test2.c -lbasler -Llib -Wl,--enable-new-dtags -Wl,-rpath,pylon5/lib64 -Wl,-E -lpylonbase -o Test2-c

$ PYLON_CAMEMU=1 LD_LIBRARY_PATH=lib ./Test2-c
Using device Emulation

This is the expected output.

However, when I try and use this with Haskell, the behaviour changes and the program blocks indefinitely. Here's the Haskell source code:

{-# LANGUAGE ForeignFunctionInterface #-}

foreign import ccall "basler_init" baslerInit :: IO ()

main :: IO ()
main = baslerInit

I compile and run this as:

$ ghc --make Test2.hs -o Test2-haskell -Llib -lbasler -optl-Wl,--enable-new-dtags -optl-Wl,-rpath,pylon5/lib64 -optl-Wl,-E -lpylonbase

$ PYLON_CAMEMU=1 LD_LIBRARY_PATH=lib ./Test2-haskell

The application now hangs indefinitely.

I have ran both through strace to try and get an idea of what is going on, but I'm unable to really make much sense of it. The output is too long to add here, but please see these two pastes:

  • strace output for the C application: https://gist.github.com/ocharles/001b5f42c09229bc7a8482a22cadf486
  • strace output for the Haskell application: https://gist.github.com/ocharles/4c1c45a9ee78f75cd723f1a2910998f3

On top of that, I've used gdb to try and ascertain where the Haskell application is getting stuck:

$ PYLON_CAMEMU=1 LD_LIBRARY_PATH=lib gdb Test2-haskell 
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from Test2-haskell...done.
(gdb) run
Starting program: /home/ollie/work/circuithub/receiving-station/Test2-haskell 
warning: File "/nix/store/9ljgbhb26ca0j9shwh8bwsa77h42izr2-gcc-5.4.0-lib/lib/libstdc++.so.6.0.21-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /nix/store/9ljgbhb26ca0j9shwh8bwsa77h42izr2-gcc-5.4.0-lib/lib/libstdc++.so.6.0.21-gdb.py
line to your configuration file "/home/ollie/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/ollie/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/bb32xf954imhdrzn7j8h82xs1bx7p3fr-glibc-2.23/lib/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff6c6fb33 in __recvfrom_nocancel () from /nix/store/98s2znxww6x7h2ch7cj1w5givahxmdna-glibc-2.23/lib/libc.so.6
(gdb) bt
#0  0x00007ffff6c6fb33 in __recvfrom_nocancel () from /nix/store/98s2znxww6x7h2ch7cj1w5givahxmdna-glibc-2.23/lib/libc.so.6
#1  0x00007fffedb885c2 in GxImp::CEnumCollector::OnReady(unsigned int, _GX_SOCKET_INTERFACE_INFO const*) () from /home/ollie/work/circuithub/receiving-station/pylon5/lib64/libgxapi-5.0.1.so
#2  0x00007fffedb8d54d in CCollector::Collect(GxImp::CSocket*, unsigned int, unsigned int, _GX_SOCKET_INTERFACE_INFO const*) () from /home/ollie/work/circuithub/receiving-station/pylon5/lib64/libgxapi-5.0.1.so
#3  0x00007fffedb8817b in CBroadcastSocketCollection::Collect(CCollector&, unsigned int) () from /home/ollie/work/circuithub/receiving-station/pylon5/lib64/libgxapi-5.0.1.so
#4  0x00007fffedb889ab in Gx::Enumerator::Discover(Gx::Enumerator::Callee*, unsigned int, unsigned int, sockaddr const*) () from /home/ollie/work/circuithub/receiving-station/pylon5/lib64/libgxapi-5.0.1.so
#5  0x00007fffeddeaca0 in Pylon::CBaslerGigETl::DoDeviceEnumeration(Pylon::DeviceInfoList&, bool, sockaddr const*) () from pylon5/lib64/libpylon_TL_gige-5.0.1.so
#6  0x00007fffeddeaebc in Pylon::CBaslerGigETl::InternalEnumerateDevices(Pylon::DeviceInfoList&) () from pylon5/lib64/libpylon_TL_gige-5.0.1.so
#7  0x00007fffeddf3c99 in Pylon::CTransportLayerBase<Pylon::IGigETransportLayer>::EnumerateDevices(Pylon::DeviceInfoList&, Pylon::DeviceInfoList const&, bool) () from pylon5/lib64/libpylon_TL_gige-5.0.1.so
#8  0x00007ffff7949669 in Pylon::CTlFactory::EnumerateDevices(Pylon::DeviceInfoList&, Pylon::DeviceInfoList const&, bool) () from pylon5/lib64/libpylonbase-5.0.1.so
#9  0x00007ffff7949c8f in Pylon::CTlFactory::InternalCreateDevice(Pylon::CDeviceInfo const&, GenICam_3_0_Basler_pylon_v5_0::gcstring_vector const&, bool) () from pylon5/lib64/libpylonbase-5.0.1.so
#10 0x00007ffff794a655 in Pylon::CTlFactory::CreateFirstDevice(Pylon::CDeviceInfo const&) () from pylon5/lib64/libpylonbase-5.0.1.so
#11 0x00007ffff7bd7dc5 in basler_init () from lib/libbasler.so
#12 0x0000000000438415 in rFl_info ()
#13 0x0000000000000000 in ?? ()

For the C program:

Reading symbols from Test2-c...done.
(gdb) run
Starting program: /home/ollie/work/circuithub/receiving-station/Test2-c 
warning: File "/nix/store/9ljgbhb26ca0j9shwh8bwsa77h42izr2-gcc-5.4.0-lib/lib/libstdc++.so.6.0.21-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /nix/store/9ljgbhb26ca0j9shwh8bwsa77h42izr2-gcc-5.4.0-lib/lib/libstdc++.so.6.0.21-gdb.py
line to your configuration file "/home/ollie/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/ollie/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/bb32xf954imhdrzn7j8h82xs1bx7p3fr-glibc-2.23/lib/libthread_db.so.1".
[New Thread 0x7fffed4ae700 (LWP 13792)]
[New Thread 0x7fffeccad700 (LWP 13793)]
Using device Emulation
[Thread 0x7fffeccad700 (LWP 13793) exited]
[Thread 0x7fffed4ae700 (LWP 13792) exited]
[Inferior 1 (process 13788) exited normally]

My guess is GHC's runtime is doing something that causes pthreads to have different behaviour, but I'm not sure what that could be.

like image 890
ocharles Avatar asked Aug 24 '16 10:08

ocharles


1 Answers

I believe this TRAC commentary is relevant:

https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Signals

The difference occurs starting a line 437 in the C strace output vs. line 495 in the Haskell strace output.

At this point the library creates two UDP sockets and sends two UDP datagrams (lines 448-449 C / lines 506-507 Haskell). The packets are broadcast to two local networks: 192.168.1.0/24 and 192.168.56.0/24.

It then waits for a response on either of these sockets with a timeout of (apparently) 25 microseconds (line 450 C / line 508 Haskell). In the C case the select call times out. In the Haskell case the select call is repeatedly interrupted by a SIGVTALRM signal which is used by the GHC RTS. This is the same pattern which is shown in the the above TRAC commentary.

For a potential fix, have a look at how the mysql package implements and uses the block_rts_signals() and unblock_rts_signals() macros:

  • mysql_signals.c
like image 174
ErikR Avatar answered Sep 28 '22 01:09

ErikR