Is there a limit to how long a constructor name can be? What are the consequences of having absurdly long constructor names?
data
If we check the source for ghc, we can find the type used for defining data constructors. It is named DataCon, and it has the following field:
dcName :: Name, -- This is the name of the *source data con*
Going down the rabbit hole, Name contains an OccName:
n_occ :: !OccName, -- Its occurrence name
An OccName contains a FastString
for the name:
data OccName = OccName { occNameSpace :: !NameSpace , occNameFS :: !FastString } deriving Typeable
Finally, a FastString is just a ByteString
, also with a precalculated length, and a int to tag it for quick comparison:
data FastString = FastString { uniq :: {-# UNPACK #-} !Int, -- unique id n_chars :: {-# UNPACK #-} !Int, -- number of chars fs_bs :: {-# UNPACK #-} !ByteString, fs_ref :: {-# UNPACK #-} !(IORef (Maybe FastZString)) } deriving Typeable
There is no limit on the size of the string using this data type (except obviously maxBound :: Int
). However that doesn't rule out a bug somewhere else in the code that may cause problems.
So we need a program to test this:
{-# LANGUAGE BangPatterns #-} {-# LANGUAGE TemplateHaskell #-} module Main where import Control.Applicative ((<$>)) import Control.Monad (forM_) import System.IO (hPutStr, hFileSize, hClose) import System.Exit (ExitCode(..)) import System.IO.Temp (withSystemTempFile) import Data.Time.Clock.POSIX (getPOSIXTime) import System.Process (readProcessWithExitCode) -- timing functions (from criterion) getTime :: IO Double getTime = (fromRational . toRational) `fmap` getPOSIXTime time :: IO a -> IO (Double, a) time act = do start <- getTime result <- act end <- getTime let !delta = end - start return (delta, result) -- make a constructor like -- data C = FFFFFF makeConstructor :: Int -> String makeConstructor size = "data C = " ++ replicate size 'F' wrapWithMainModule :: String -> String wrapWithMainModule code = unlines ["module Main where", "main = return ()", code] data CompileResults = CompileResults { timeTaken :: Double, success :: Bool, outputFileSize :: Integer } deriving (Show) compileHsCode :: String -> IO CompileResults compileHsCode sourceCode = withSystemTempFile "test.hs" $ \path handle -> do withSystemTempFile "output.o" $ \outputPath outputHandle -> do hPutStr handle $ wrapWithMainModule sourceCode hClose handle (timeTaken, (exitCode, _, _)) <- time $ readProcessWithExitCode "ghc" ["-c", "-o", outputPath, path] "" let success = exitCode == ExitSuccess size <- if success then hFileSize outputHandle else return 0 return $ CompileResults { timeTaken = timeTaken , success = success , outputFileSize = size } testConstructorSizes :: [Int] -> IO () testConstructorSizes sizes = forM_ sizes $ \size -> do info <- compileHsCode $ makeConstructor size putStrLn $ "For Size " ++ show size ++ "\t: " ++ show info -- Up to 10 million sizesToTest :: [Int] sizesToTest = take 7 (iterate (*10) 10) main = testConstructorSizes $ sizesToTest
Here is the output of running main
:
For Size 10 : CompileResults {timeTaken = 0.1390078067779541, success = True, outputFileSize = 1818} For Size 100 : CompileResults {timeTaken = 0.14700841903686523, success = True, outputFileSize = 2086} For Size 1000 : CompileResults {timeTaken = 0.1390080451965332, success = True, outputFileSize = 4786} For Size 10000 : CompileResults {timeTaken = 0.1520085334777832, success = True, outputFileSize = 31786} For Size 100000 : CompileResults {timeTaken = 0.31201791763305664, success = True, outputFileSize = 301786} For Size 1000000 : CompileResults {timeTaken = 2.26712965965271, success = True, outputFileSize = 3001786} For Size 10000000 : CompileResults {timeTaken = 109.2182469367981, success = True, outputFileSize = 30001786}
Few interesting points:
(1786 + (constructorSize * 3)
. So each char takes up three bytes when used in a constructor.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With