Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a pure String to String function in Haskell FFI to C++

I want to implement a function in C++ via Haskell FFI, which should have the (final) type of String -> String. Say, is it possible to re-implement the following function in C++ with the exact same signature?

import Data.Char
toUppers:: String -> String
toUppers s = map toUpper s

In particular, I wanted to avoid having an IO in the return type because introducing the impurity (by that I mean the IO monad) for this simple task is logically unnecessary. All examples involing a C string I have seen so far involve returning an IO something or Ptr which cannot be converted back to a pure String.

The reason I want to do this is that I have the impression that marshaling is messy with FFI. Maybe if I can fix the simplest case above (other than primitive types such as int), then I can do whatever data parsing I want on the C++ side, which should be easy.

The cost of parsing is negligible compared to the computation that I want to do between the marshalling to/from strings.

Thanks in advance.

like image 588
thor Avatar asked Jun 02 '13 23:06

thor


1 Answers

You need to involve IO at least at some point, to allocate buffers for the C-strings. The straightforward solution here would probably be:

import Foreign
import Foreign.C
import System.IO.Unsafe as Unsafe

foreign import ccall "touppers" c_touppers :: CString -> IO ()
toUppers :: String -> String
toUppers s =
  Unsafe.unsafePerformIO $
    withCString s $ \cs ->
      c_touppers cs >> peekCString cs

Where we use withCString to marshall the Haskell string into a buffer, change it to upper-case and finally un-marshall the (changed!) buffer contents into the new Haskell string.

Another solution could be to delegate messing with IO to the bytestring library. That could be a good idea anyways if you are interested in performance. The solution would look roughly like follows:

import Data.ByteString.Internal

foreign import ccall "touppers2" 
  c_touppers2 :: Int -> Ptr Word8 -> Ptr Word8 -> IO ()
toUppers2 :: ByteString -> ByteString
toUppers2 s =
  unsafeCreate l $ \p2 -> 
    withForeignPtr fp $ \p1 ->
      c_touppers2 l (p1 `plusPtr` o) p2
 where (fp, o, l) = toForeignPtr s

This is a bit more elegant, as we now don't actually have to do any marshalling, just convert pointers. On the other hand, the C++ side changes in two respects - we have to handle possibly non-null-terminated strings (need to pass the length) and now have to write to a different buffer, as the input is not a copy anymore.


For reference, here are two quick-and-dirty C++ functions that fit the above imports:

#include <ctype.h>
extern "C" void touppers(char *s) {
    for (; *s; s++) *s = toupper(*s);
}
extern "C" void touppers2(int l, char *s, char *t) {
    for (int i = 0; i < l; i++) t[i] = toupper(s[i]);
}
like image 71
Peter Wortmann Avatar answered Nov 09 '22 18:11

Peter Wortmann