Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly link object files written in Haskell?

Roughly following this tutorial, I managed to get this toy project working. It calls a Haskell function from a C++ program.

  • Foo.hs

    {-# LANGUAGE ForeignFunctionInterface #-}
    
    module Foo where
    
    foreign export ccall foo :: Int -> Int -> IO Int
    
    foo :: Int -> Int -> IO Int
    foo n m = return . sum $ f n ++ f m
    
    f :: Int -> [Int]
    f 0 = []
    f n = n : f (n-1)
    
  • bar.c++

    #include "HsFFI.h"
    #include FOO       // Haskell module (path defined in build script)
    
    #include <iostream>
    
    int main(int argc, char *argv[]) {
      hs_init(&argc, &argv);
    
      std::cout << foo(37, 19) << "\n";
    
      hs_exit();
      return 0;
    }
    
  • call-haskell-from-cxx.cabal

    name:                call-haskell-from-cxx
    version:             0.1.0.0
    build-type:          Simple
    cabal-version:       >=1.10
    
    executable foo.so
      main-is:          Foo.hs   
      build-depends:       base >=4.10 && <4.11
      ghc-options:         -shared -fPIC -dynamic
      extra-libraries:     HSrts-ghc8.2.1
      default-language:    Haskell2010
    
  • build script

    #!/bin/bash
    
    hs_lib="foo.so"
    hs_obj="dist/build/$hs_lib/$hs_lib"
    
    ghc_version="8.2.1"                          # May need to be tweaked,
    ghc_libdir="/usr/local/lib/ghc-$ghc_version" # depending on system setup.
    
    set -x
    
    cabal build
    
    g++ -I "$ghc_libdir/include" -D"FOO=\"${hs_obj}-tmp/Foo_stub.h\"" -c bar.c++ -o test.o
    g++ test.o "$hs_obj" \
       -L "$ghc_libdir/rts" "-lHSrts-ghc$ghc_version" \
       -o test
    
    env LD_LIBRARY_PATH="dist/build/$hs_lib:$ghc_libdir/rts:$LD_LIBRARY_PATH" \
      ./test
    

This works (Ubuntu 16.04, GCC 5.4.0), printing 893 – but it isn't really robust, namely, if I remove the actual invocation of the Haskell function, i.e. the std::cout << foo(37, 19) << "\n"; line, then it fails at the linking step, with the error message

/usr/local/lib/ghc-8.2.1/rts/libHSrts-ghc8.2.1.so: undefined reference to `base_GHCziTopHandler_flushStdHandles_closure'
/usr/local/lib/ghc-8.2.1/rts/libHSrts-ghc8.2.1.so: undefined reference to `base_GHCziStable_StablePtr_con_info'
/usr/local/lib/ghc-8.2.1/rts/libHSrts-ghc8.2.1.so: undefined reference to `base_GHCziPtr_FunPtr_con_info'
/usr/local/lib/ghc-8.2.1/rts/libHSrts-ghc8.2.1.so: undefined reference to `base_GHCziWord_W8zh_con_info'
/usr/local/lib/ghc-8.2.1/rts/libHSrts-ghc8.2.1.so: undefined reference to `base_GHCziIOziException_cannotCompactPinned_closure'
...

Apparently, the inclusion of the Haskell project pulls additional library files in that are needed. How do I explicitly depend on everything necessary, to avoid such brittleness?


Output of the build script when the foo call is included, with ldd on the final executable:

++ cabal build
Preprocessing executable 'foo.so' for call-haskell-from-C-0.1.0.0..
Building executable 'foo.so' for call-haskell-from-C-0.1.0.0..
Linking a.out ...
Linking dist/build/foo.so/foo.so ...
++ g++ -I /usr/local/lib/ghc-8.2.1/include '-DFOO="dist/build/foo.so/foo.so-tmp/Foo_stub.h"' -c bar.c++ -o test.o
++ g++ test.o dist/build/foo.so/foo.so -L /usr/local/lib/ghc-8.2.1/rts -lHSrts-ghc8.2.1 -o test
++ env LD_LIBRARY_PATH=dist/build/foo.so:/usr/local/lib/ghc-8.2.1/rts: sh -c 'ldd ./test; ./test'
    linux-vdso.so.1 =>  (0x00007fff23105000)
    foo.so => dist/build/foo.so/foo.so (0x00007fdfc5360000)
    libHSrts-ghc8.2.1.so => /usr/local/lib/ghc-8.2.1/rts/libHSrts-ghc8.2.1.so (0x00007fdfc52f8000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fdfc4dbe000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdfc49f4000)
    libHSbase-4.10.0.0-ghc8.2.1.so => /usr/local/lib/ghc-8.2.1/base-4.10.0.0/libHSbase-4.10.0.0-ghc8.2.1.so (0x00007fdfc4020000)
    libHSinteger-gmp-1.0.1.0-ghc8.2.1.so => /usr/local/lib/ghc-8.2.1/integer-gmp-1.0.1.0/libHSinteger-gmp-1.0.1.0-ghc8.2.1.so (0x00007fdfc528b000)
    libHSghc-prim-0.5.1.0-ghc8.2.1.so => /usr/local/lib/ghc-8.2.1/ghc-prim-0.5.1.0/libHSghc-prim-0.5.1.0-ghc8.2.1.so (0x00007fdfc3b80000)
    libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fdfc3900000)
    libffi.so.6 => /usr/local/lib/ghc-8.2.1/rts/libffi.so.6 (0x00007fdfc36f3000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdfc33ea000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fdfc31e2000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdfc2fde000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdfc2dc1000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fdfc5140000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fdfc2bab000)
like image 210
leftaroundabout Avatar asked May 28 '18 15:05

leftaroundabout


People also ask

How object files are linked?

A shared object file holds code and data suitable to be linked in two contexts. First, the link-editor can process it with other relocatable and shared object files to create other object files. Second, the runtime linker combines it with a dynamic executable file and other shared objects to create a process image.

What is object file linker?

In computing, a linker or link editor is a computer system program that takes one or more object files (generated by a compiler or an assembler) and combines them into a single executable file, library file, or another "object" file.


2 Answers

This answer explains what happens during the linkage, why the solution with -Wl,--no-as-needed works and what should be done instead to have a somewhat more robust approach.

In a nutshell: -lHSrts-ghcXXX.so depends on libHSbaseXXX.so, libHSinteger-gmpXXX.so and libHSghc-primXXX.so which must be provided to the linker during the linkage.

The here proposed solution depends on a lot of manual work and is not very scalable. However I don't know enough about cabal to tell you how to automatize this, but I hope you can make the last step.

Or maybe you will be just fine with using the -Wl,--no-as-needed-solution, because you know what happens behind the scenes.


Let's start by stepping through the linking process for the version without calling foo, in a somewhat simplified manner (here is a great article from Eli Bendersky, even if it is about static linkage):

  1. The linker maintains a table of symbols and has to find definitions/machine-code for all of them. Let's simplify and assume, that at the beginning it has only symbol main in the table and the definition of this symbol is unknown.

  2. The definition of symbol main is found it the object-file test.o. However, the function main uses functions hs_init and hs_exit. Thus we found the definition of main, but it doesn't work unless we know the definitions of hs_init and hs_exit. So now we have to look for their definitions.

  3. In the next step the linker looks at foo.so, but foo.so doesn't define any symbol we are interested in (foo is not used!) and the linker just skips foo.so and will never look back.

  4. The linker looks at -lHSrts-ghcXXX.so. There it finds the definitions of hs_init and hs_exit. Thus, the whole content of the shared library is used, but it needs definitions of such symbols as for example base_GHCziTopHandler_flushStdHandles_closure. That means the linker starts to look for definitions of these symbols.

  5. There are however no more libraries at the command line, thus the linker has nothing to look at and the linkage fails/is not successful, because definitions of some symbols are missing.

What is different for the case where foo is used? After the 2. step not only hs_init and hs_exit are wanted but also foo, which is found in foo.so. So foo.so must be included.

Due to the way the library foo.so was build, there is the following information contained:

>>> readelf -d dist/build/foo.so/foo.so | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libHSrts-ghc7.10.3.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSbase-4.8.2.0-HQfYBxpPvuw8OunzQu6JGM-ghc7.10.3.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSinteger-gmp-1.0.0.0-2aU3IZNMF9a7mQ0OzsZ0dS-ghc7.10.3.so]
 0x0000000000000001 (NEEDED)             Shared library: [libHSghc-prim-0.4.0.0-8TmvWUcS1U1IKHT0levwg3-ghc7.10.3.so]
 0x0000000000000001 (NEEDED)             Shared library: [libgmp.so.10]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

>>> readelf -d dist/build/foo.so/foo.so | grep RPATH
 0x000000000000000f (RPATH)              Library rpath: [
          /usr/lib/ghc/base_HQfYBxpPvuw8OunzQu6JGM:
          /usr/lib/ghc/rts:
          /usr/lib/ghc/ghcpr_8TmvWUcS1U1IKHT0levwg3:
          /usr/lib/ghc/integ_2aU3IZNMF9a7mQ0OzsZ0dS]

From this information, the linker knows which shared libraries are needed (NEEDED-flag) and where they can be found on your system (RPATH). These libraries are found/opened/processed (i.e. marked as needed) and thus all necessary definitions are present.

You can follow the whole process by adding

g++ ...
    -Wl,--trace-symbol=base_GHCziTopHandler_flushStdHandles_closure \
    -Wl,--verbose \
    -o test

to the linkage-step.

The same thing happens if we enforce that the foo.so is included into the resulting executable via -Wl,--no-as-needed as suggested by @Yuras.

What is the consequence of this analysis?

We should provide the needed libraries on the command line (after -lHSrts-ghcXXX.so) and not depend on them being added per chance through other shared-libraries. Obviously, the somewhat cryptic names are only valid for my installation:

g++ ...
   -L/usr/lib/ghc/base_HQfYBxpPvuw8OunzQu6JGM  -lHSbase-4.8.2.0-HQfYBxpPvuw8OunzQu6JGM-ghc7.10.3 \
   -L/usr/lib/ghc/integ_2aU3IZNMF9a7mQ0OzsZ0dS -lHSinteger-gmp-1.0.0.0-2aU3IZNMF9a7mQ0OzsZ0dS-ghc7.10.3 \
   -L/usr/lib/ghc/ghcpr_8TmvWUcS1U1IKHT0levwg3 -lHSghc-prim-0.4.0.0-8TmvWUcS1U1IKHT0levwg3-ghc7.10.3 \
   ...
   -o test

Now it builds, but doesn't load at the run time (after all the right rpath is only set in foo.so but foo.so isn't used). To fix it we could either extend the LD_LIBRARY_PATH or add -rpath the link-command-line:

g++ ...
   -L...  -lHSbase-...  -Wl,-rpath,/usr/lib/ghc/base_HQfYBxpPvuw8OunzQu6JGM  \
   -L... -lHSinteger-gmp-... -Wl,-rpath,/usr/lib/ghc/integ_2aU3IZNMF9a7mQ0OzsZ0dS \
   -L... -lHSghc-prim-...  -Wl,-rpath,/usr/lib/ghc/ghcpr_8TmvWUcS1U1IKHT0levwg3 \
   ...
   -o test

There must be an utility to get the paths and library-names automatically (cabal seems to do it when building foo.so), but I don't know how to do because I have no experience with haskell/cabal.

like image 77
ead Avatar answered Oct 21 '22 17:10

ead


Usually ghc links executables with -Wl,--no-as-needed option, and you should use it too. (You can check how ghc links executable e.g. using cabal build --ghc-options=-v3.)

You can find more details here. My understanding it the next: foo.so requires libHSbase-4.10.0.0-ghc8.2.1.so to be loaded at runtime as needed, i.e. when we need symbol from it (check readelf -a dist/build/foo.so/foo.so | grep NEEDED). So if you don't call foo, then base.so is not loaded. But ghc needs all libraries to be loaded (I don't know why). The --no-as-needed option forces all libraries to be loaded.

Note that --no-as-needed options is position-dependent, so put it before the shared library.

like image 2
Yuras Avatar answered Oct 21 '22 17:10

Yuras