Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Track down Haskell segfault

Tags:

haskell

This is so infuriating! >_<

I've written a huge, complicated Haskell library. I wrote a small test program, and so far I've spent about 8 hours trying to figure out why the hell it keeps crashing on me. Sometimes GHC complains about a "strange closure type". Sometimes I just get a segfault. Clearly the problem is memory corruption.

The library itself is 100% pure Haskell. However, the test program uses several unsafe GHC primitives relating to arrays. This is obviously what's causing the problem. Indeed, if I comment out the writeArray# line, the program stops crashing. But this is utterly frying my noodle... as best as I can tell, all the array bounds I've used are perfectly valid. The program prints them all out, and they're all positive and less than the array size.

I wrote a second program that does the same thing as the first one, but without involving the huge, complex library. I've tried and tried and tried, but I can't make it crash at all. Nothing I do seems to make it crash, and yet it does almost exactly the same thing with the actual arrays.

Does anybody have any further troubleshooting tips? Is there some way I can track down the exact moment when memory is getting corrupted? (Rather than just the moment when the system notices the corruption.)


Update:

What does the problem do?

Well, essentially, it creates an array representing a pixel buffer. It spawns one thread that iterates over every pixel and writes the corresponding value into it. And it spawns a second thread that reads the array, and writes the pixels to a network socket using a fairly complicated protocol. (Hence the large library I'm trying to test.)

If I don't spawn the writer thread, the crash goes away. If I comment out the writeArray' call in the writer thread, the crash goes away. Before writing each pixel, the writer thread prints out the pixel coordinates and the array index. Everything it prints out looks perfectly A-OK. And yet... it will not stop crashing.

I almost wonder if GHC's array primitives aren't thread-safe or something. (In case it makes any difference, the copy of the array that the reader thread looks like has been unsafe-frozen, while the writer thread continues to concurrently mutates it.)

However, I've written a program that does the exact same thing, but without sending traffic over the network. This program works perfectly in every detail. It's only the really complicated program that won't work. How annoying is that?!

This works: http://hpaste.org/70987

This does not: http://hpaste.org/70988

like image 304
MathematicalOrchid Avatar asked Jul 06 '12 10:07

MathematicalOrchid


3 Answers

You're already logging your use of unsafe primitives.

Have you written a program to look through these logs for violations of invariants?

like image 123
dave4420 Avatar answered Oct 04 '22 17:10

dave4420


Replace your known-to-be-unsafe functions with their safe, checked versions. Inspect your logs for the exceptions that will result, and fix your code.

like image 24
Don Stewart Avatar answered Oct 04 '22 17:10

Don Stewart


Maybe the difference between the test program and the program with the library is that in the latter case there is more allocation, so GC is called more frequently.

The copy of the array that the reader thread looks like has been unsafe-frozen, while the writer thread continues to concurrently mutates it.

Probably GC cannot track that the mutable array is still referenced after freezing. In this case GC might move the frozen array, but writeArray# performs a write using the old pointer.

like image 29
Boris Avatar answered Oct 04 '22 16:10

Boris