Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sorting integers fast in haskell

Is there any function in haskell libraries that sorts integers in O(n) time?? [By, O(n) I mean faster than comparison sort and specific for integers]

Basically I find that the following code takes a lot of time with the sort (as compared to summing the list without sorting) :

import System.Random
import Control.DeepSeq
import Data.List (sort)

genlist gen = id $!! sort $!! take (2^22) ((randoms gen)::[Int])

main = do
    gen <- newStdGen
    putStrLn $ show $ sum $ genlist gen

Summing a list doesn't require deepseq but what I am trying for does, but the above code is good enough for the pointers I am seeking.

Time : 6 seconds (without sort); about 35 seconds (with sort)

Memory : about 80 MB (without sort); about 310 MB (with sort)

Note 1 : memory is a bigger issue than time for me here as for the task at hand I am getting out of memory errors (memory usage becomes 3GB! after 30 minutes of run-time)

I am assuming faster algorithms will provide bettor memory print too, hence looking for O(n) time.

Note 2 : I am looking for fast algorithms for Int64, though fast algorithms for other specific types will also be helpful.


Solution Used : IntroSort with unboxed vectors was good enough for my task:

import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I

sort :: [Int] -> [Int]
sort = V.toList . V.modify I.sort . V.fromList
like image 389
Karan Avatar asked Apr 01 '12 13:04

Karan


3 Answers

I would consider using vectors instead of lists for this, as lists have a lot of overhead per-element while an unboxed vector is essentially just a contiguous block of bytes. The vector-algorithms package contains various sorting algorithms you can use for this, including radix sort, which I expect should do well in your case.

Here's a simple example, though it might be a good idea to keep the result in vector form if you plan on doing further processing on it.

import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Radix as R

sort :: [Int] -> [Int]
sort = V.toList . V.modify R.sort . V.fromList

Also, I suspect that a significant portion of the run time of your example is coming from the random number generator, as the standard one isn't exactly known for its performance. You should make sure that you're timing only the sorting part, and if you need a lot of random numbers in your program, there are faster generators available on Hackage.

like image 166
hammar Avatar answered Sep 17 '22 23:09

hammar


The idea to sort the numbers using an array is the right one for reducing the memory usage.

However, using the maximum and minimum of the list as bounds may cause exceeding memory usage or even a runtime failure when maximum xs - minimum xs > (maxBound :: Int).

So I suggest writing the list contents to an unboxed mutable array, sorting that inplace (e.g. with quicksort), and then building a list from that again.

import System.Random
import Control.DeepSeq
import Data.Array.Base (unsafeRead, unsafeWrite)
import Data.Array.ST
import Control.Monad.ST

myqsort :: STUArray s Int Int -> Int -> Int -> ST s ()
myqsort a lo hi
   | lo < hi   = do
       let lscan p h i
               | i < h = do
                   v <- unsafeRead a i
                   if p < v then return i else lscan p h (i+1)
               | otherwise = return i
           rscan p l i
               | l < i = do
                   v <- unsafeRead a i
                   if v < p then return i else rscan p l (i-1)
               | otherwise = return i
           swap i j = do
               v <- unsafeRead a i
               unsafeRead a j >>= unsafeWrite a i
               unsafeWrite a j v
           sloop p l h
               | l < h = do
                   l1 <- lscan p h l
                   h1 <- rscan p l1 h
                   if (l1 < h1) then (swap l1 h1 >> sloop p l1 h1) else return l1
               | otherwise = return l
       piv <- unsafeRead a hi
       i <- sloop piv lo hi
       swap i hi
       myqsort a lo (i-1)
       myqsort a (i+1) hi
   | otherwise = return ()


genlist gen = runST $ do
    arr <- newListArray (0,2^22-1) $ take (2^22) (randoms gen)
    myqsort arr 0 (2^22-1)
    let collect acc 0 = do
            v <- unsafeRead arr 0
            return (v:acc)
        collect acc i = do
            v <- unsafeRead arr i
            collect (v:acc) (i-1)
    collect [] (2^22-1)

main = do
    gen <- newStdGen
    putStrLn $ show $ sum $ genlist gen

is reasonably fast and uses less memory. It still uses a lot of memory for the list, 222Ints take 32MB storage raw (with 64-bit Ints), with the list overhead of iirc five words per element, that adds up to ~200MB, but less than half of the original.

like image 31
Daniel Fischer Avatar answered Sep 17 '22 23:09

Daniel Fischer


This is taken from Richard Bird's book, Pearls of Functional Algorithm Design, (though I had to edit it a little, as the code in the book didn't compile exactly as written).

import Data.Array(Array,accumArray,assocs)  

sort :: [Int] -> [Int]
sort xs = concat [replicate k x | (x,k) <- assocs count]
        where count :: Array Int Int 
              count = accumArray (+) 0 range (zip xs (repeat 1))
              range = (0, maximum xs)

It works by creating an Array indexed by integers where the values are the number of times each integer occurs in the list. Then it creates a list of the indexes, repeating them the same number of times they occurred in the original list according to the counts.

You should note that it is linear with the maximum value in the list, not the length of the list, so a list like [ 2^x | x <- [0..n] ] would not be sorted linearly.

like image 32
Peter Hall Avatar answered Sep 19 '22 23:09

Peter Hall