I'm working on a terrain generation algorithm for a MineCraft-like world. Currently, I'm using simplex noise based on the implementation in the paper 'Simplex Noise Demystified' [PDF], since simplex noise is supposed to be faster and to have fewer artifacts than Perlin noise. This looks fairly decent (see image), but so far it's also pretty slow.
Running the noise function 10 times (I need noise with different wavelengths for things like terrain height, temperature, tree location, etc.) with 3 octaves of noise for each block in a chunk (16x16x128 blocks), or about 1 million calls to the noise function in total, takes about 700-800 ms. This is at least an order of magnitude too slow for the purposes of generating terrain with any decent kind of speed, despite the fact that there are no obvious expensive operations in the algorithm (at least to me). Just floor, modulo, some array lookups and basic arithmetic. The algorithm (written in Haskell) is listed below. The SCC comments are for profiling. I've omitted the 2D noise functions, since they work the same way.
g3 :: (Floating a, RealFrac a) => a g3 = 1/6 {-# INLINE int #-} int :: (Integral a, Num b) => a -> b int = fromIntegral grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a) grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)] {-# INLINE dot3 #-} dot3 :: Num a => (a, a, a) -> a -> a -> a -> a dot3 (a,b,c) x y z = a * x + b * y + c * z {-# INLINE fastFloor #-} fastFloor :: RealFrac a => a -> Int fastFloor x = truncate (if x > 0 then x else x - 1) --Generate a random permutation for use in the noise functions perm :: Int -> Permutation perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed --Generate 3D noise between -0.5 and 0.5 simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a simplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255) gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12 gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12 gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12 gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z' harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1) --Generate harmonic 3D noise between -0.5 and 0.5 harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a harmonicNoise3D p octaves l x y z = harmonic octaves (\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))
For profiling, I used the following code,
q _ = let p = perm 0 in sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]] main = do start <- getCurrentTime print $ q () end <- getCurrentTime print $ diffUTCTime end start
which produces the following information:
COST CENTRE MODULE %time %alloc simplex3D Main 18.8 21.0 n Main 18.0 19.6 out Main 10.1 9.2 harmonicNoise3D Main 9.8 4.5 harmonic Main 6.4 5.8 int Main 4.0 2.9 gi3 Main 4.0 3.0 xyz2 Main 3.5 5.9 gi1 Main 3.4 3.4 gi0 Main 3.4 2.7 fastFloor Main 3.2 0.6 xyz1 Main 2.9 5.9 ijk Main 2.7 3.5 gi2 Main 2.7 3.3 xyz3 Main 2.6 4.1 iijjkk Main 1.6 2.5 dot3 Main 1.6 0.7
To compare, I also ported the algorithm to C#. Performance there was about 3 to 4 times faster, so I imagine I must be doing something wrong. But even then it's not nearly as fast as I would like. So my question is this: can anyone tell me if there are any ways to speed up my implementation and/or the algorithm in general or does anyone know of a different noise algorithm that has better performance characteristics but a similar look?
Update:
After following some of the suggestions offered below, the code now looks as follows:
module Noise ( Permutation, perm , noise3D, simplex3D ) where import Data.Bits import qualified Data.Vector.Unboxed as UV import System.Random import System.Random.Shuffle type Permutation = UV.Vector Int g3 :: Double g3 = 1/6 {-# INLINE int #-} int :: Int -> Double int = fromIntegral grad3 :: UV.Vector (Double, Double, Double) grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)] {-# INLINE dot3 #-} dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double dot3 (a,b,c) x y z = a * x + b * y + c * z {-# INLINE fastFloor #-} fastFloor :: Double -> Int fastFloor x = truncate (if x > 0 then x else x - 1) --Generate a random permutation for use in the noise functions perm :: Int -> Permutation perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed --Generate 3D noise between -0.5 and 0.5 noise3D :: Permutation -> Double -> Double -> Double -> Double noise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255) gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12 gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12 gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12 gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = (\a -> if a < 0 then 0 else a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z' harmonic :: Int -> (Double -> Double) -> Double harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1) --3D simplex noise --syntax: simplex3D permutation number_of_octaves wavelength x y z simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double simplex3D p octaves l x y z = harmonic octaves (\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))
Together with reducing my chunk size to 8x8x128, generating new terrain chunks now occurs at about 10-20 fps, which means moving around is now not nearly as problematic as before. Of course, any other performance improvements are still welcome.
The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to Double
, so GHC (and LLVM) have a chance of applying more aggressive optimizations.
Note, for those trying to reproduce, this code imports:
import qualified Data.Vector as V import Data.Bits import Data.Time.Clock import System.Random import System.Random.Shuffle type Permutation = V.Vector Int
Ok. There's lots of things you can try to improve this code.
Improvements
Data representation
(a,a,a)
with unboxed triple T !Double !Double !Double
Data.Array
to Data.Array.Unboxed
for Permutations
repa
packageCompiler flags
-O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native
(or equivalent with -fllvm
)-fspec-constr-count=16
More efficient library functions
mod
with rem
V.!
indexing with unchecked indexing VU.unsafeIndex
(after moving to Data.Vector.Unboxed
Runtime settings
-A20M
or -H
Also, check your algorithm is identical to the C# one, and you're using the same data structures.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With