In this programming problem, the input is an n
×m
integer matrix. Typically, n
≈ 105 and m
≈ 10. The official solution (1606D, Tutorial) is quite imperative: it involves some matrix manipulation, precomputation and aggregation. For fun, I took it as an STUArray implementation exercise.
I have managed to implement it using STUArray, but still the program takes way more memory than permitted (256MB). Even when run locally, the maximum resident set size is >400 MB. On profiling, reading from stdin seems to be dominating the memory footprint:
Functions readv
and readv.readInt
, responsible for parsing integers and saving them into a 2D list, are taking around 50-70 MB, as opposed to around 16 MB = (106 integers) × (8 bytes per integer + 8 bytes per link).
Is there a hope I can get the total memory below 256 MB? I'm already using Text
package for input. Maybe I should avoid lists altogether and directly read integers from stdin to the array. How can we do that? Or, is the issue elsewhere?
{-# OPTIONS_GHC -O2 #-}
module CF1606D where
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import qualified Data.Text.Read as TR
import Control.Monad
import qualified Data.List as DL
import qualified Data.IntSet as DS
import Control.Monad.ST
import Data.Array.ST.Safe
import Data.Int (Int32)
import Data.Array.Unboxed
solve :: IO ()
solve = do
~[n,m] <- readv
-- 2D list
input <- {-# SCC input #-} replicateM (fromIntegral n) readv
let
ints = [1..]
sorted = DL.sortOn (head.fst) (zip input ints)
(rows,indices) = {-# SCC rows_inds #-} unzip sorted
-- 2D list converted into matrix:
matrix = mat (fromIntegral n) (fromIntegral m) rows
infinite = 10^7
asc x y = [x,x+1..y]
desc x y = [y,y-1..x]
-- Four prefix-matrices:
tlMax = runSTUArray $ prefixMat max 0 asc asc (subtract 1) (subtract 1) =<< matrix
blMin = runSTUArray $ prefixMat min infinite desc asc (+1) (subtract 1) =<< matrix
trMin = runSTUArray $ prefixMat min infinite asc desc (subtract 1) (+1) =<< matrix
brMax = runSTUArray $ prefixMat max 0 desc desc (+1) (+1) =<< matrix
good _ (i,j)
| tlMax!(i,j) < blMin!(i+1,j) && brMax!(i+1,j+1) < trMin!(i,j+1) = Left (i,j)
| otherwise = Right ()
{-# INLINABLE good #-}
nearAns = foldM good () [(i,j)|i<-[1..n-1],j<-[1..m-1]]
ans = either (\(i,j)-> "YES\n" ++ color n (take i indices) ++ " " ++ show j) (const "NO") nearAns
putStrLn ans
type I = Int32
type S s = (STUArray s (Int, Int) I)
type R = Int -> Int -> [Int]
type F = Int -> Int
mat :: Int -> Int -> [[I]] -> ST s (S s)
mat n m rows = newListArray ((1,1),(n,m)) $ concat rows
prefixMat :: (I->I->I) -> I -> R -> R -> F -> F -> S s -> ST s (S s)
prefixMat opt worst ordi ordj previ prevj mat = do
((ilo,jlo),(ihi,jhi)) <- getBounds mat
pre <- newArray ((ilo-1,jlo-1),(ihi+1,jhi+1)) worst
forM_ (ordi ilo ihi) $ \i-> do
forM_ (ordj jlo jhi) $ \j -> do
matij <- readArray mat (i,j)
prei <- readArray pre (previ i,j)
prej <- readArray pre (i, prevj j)
writeArray pre (i,j) (opt (opt prei prej) matij)
return pre
color :: Int -> [Int] -> String
color n inds = let
temp = DS.fromList inds
colors = [if DS.member i temp then 'B' else 'R' | i<-[1..n]]
in colors
readv :: Integral t => IO [t]
readv = map readInt . T.words <$> TI.getLine where
readInt = fromIntegral . either (const 0) fst . TR.signed TR.decimal
{-# INLINABLE readv #-}
main :: IO ()
main = do
~[n] <- readv
replicateM_ n solve
Quick description of the code above:
n
rows each having m
integers.Sample input and Commands
Sample input: inp3.txt.
Command:
> stack ghc -- -main-is CF1606D.main -with-rtsopts="-s -h -p -P" -rtsopts -prof -fprof-auto CF1606D
> gtime -v ./CF1606D < inp3.txt > outp
...
...
MUT time 2.990s ( 3.744s elapsed) # RTS -s output
GC time 4.525s ( 6.231s elapsed) # RTS -s output
...
...
Maximum resident set size (kbytes): 408532 # >256 MB (gtime output)
> stack exec -- hp2ps -t0.1 -e8in -c CF1606D.hp && open CF1606D.ps
Question about GC: As shown above in the +RTS -s output, GC seems to be taking longer than the actual logic execution. Is this normal? Is there a way to visualize the GC activity over time? I tried making matrices strict but that didn't have any impact.
Probably this is not a functional-friendly problem at all (although I'll be happy to be disproved on this). For example, Java uses GC too but there are lots of successful Java submissions. Still, I want to see how far I can push. Thanks!
Contrary to common belief Haskell is quite friendly with respect to problems like that. The real issue is that the array
library that comes with GHC is total garbage. Another big problem is that everyone is taught in Haskell to use lists where arrays should be used instead, which is usually one of the major sources of slow code and memory bloated programs. So, it is not surprising that GC takes a long time, it is because there is way too much stuff being allocation. Here is a run on the supplied input for the solution provided below:
1,483,547,096 bytes allocated in the heap
566,448 bytes copied during GC
18,703,640 bytes maximum residency (3 sample(s))
1,223,400 bytes maximum slop
32 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1399 colls, 0 par 0.009s 0.009s 0.0000s 0.0011s
Gen 1 3 colls, 0 par 0.002s 0.002s 0.0006s 0.0016s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.484s ( 0.517s elapsed)
GC time 0.011s ( 0.011s elapsed)
EXIT time 0.001s ( 0.002s elapsed)
Total time 0.496s ( 0.530s elapsed)
The solution provided below uses an array library massiv
, which makes it impossible to submit to codeforces. However, hopefully the goal is to get better at Haskell, rather than get points on some website.
The red-blue matrix can be separated into two stages: read and solve
In the main
function we only read total number of arrays and dimensions for each array. Also we print the outcome. Nothing exciting here. (Note that the linked file inp3.txt
has a larger array than the limits defined in the problem: n*m <= 10^6
)
import Control.Monad.ST
import Control.Monad
import qualified Data.ByteString as BS
import Data.Massiv.Array as A hiding (B)
import Data.Massiv.Array.Mutable.Algorithms (quicksortByM_)
import Control.Scheduler (trivialScheduler_)
main :: IO ()
main = do
t <- Prelude.read <$> getLine
when (t < 1 || t > 1000) $ error $ "Invalid t: " ++ show t
replicateM_ t $ do
dimsStr <- getLine
case Prelude.map Prelude.read (words dimsStr) of
-- Test file fails this check: && n * m <= 10 ^ (6 :: Int) -> do
[n, m] | n >= 2 && m > 0 && m <= 5 * 10 ^ (5 :: Int) -> do
mat <- readMatrix n m
case solve mat of
Nothing -> putStrLn "NO"
Just (ix, cs) -> do
putStrLn "YES"
putStr $ foldMap show cs
putStr " "
print ix
_ -> putStrLn $ "Unexpected dimensions: " ++ show dimsStr
Loading the input into array is the major source of problems int the original question:
text
, ascii characters is the only valid input expected by the problem.Normally in such situation it would be much better to read input in a streaming fashion using something like conduit
. In particular, reading input as stream of bytes and parsing those bytes as numbers would be the optimal solution. That being said there are hard requirements on the width of each array in the description of the problem, so we can get away with reading input line-by-line as a ByteString
and then parsing numbers (assumed unsigned for simplicity) in each line and write those numbers into array at the same time. This ensures that at this stage we will only have allocated the resulting array and a single line as sequence of bytes. This could be done cleaner with a parsing library like attoparsec
, but problem is simple enough to just do it adhoc.
type Val = Word
readMatrix :: Int -> Int -> IO (Matrix P Val)
readMatrix n m = createArrayS_ (Sz2 n m) readMMatrix
readMMatrix :: MMatrix RealWorld P Val -> IO ()
readMMatrix mat =
loopM_ 0 (< n) (+ 1) $ \i -> do
line <- BS.getLine
--- ^ reads at most 10Mb because it is known that input will be at most
-- 5*10^5 Words: 19 digits max per Word and one for space: 5*10^5 * 20bytes
loopM 0 (< m) (+ 1) line $ \j bs ->
let (word, bs') = parseWord bs
in bs' <$ write_ mat (i :. j) word
where
Sz2 n m = sizeOfMArray mat
isSpace = (== 32)
isDigit w8 = w8 >= 48 && w8 <= 57
parseWord bs =
case BS.uncons bs of
Just (w8, bs')
| isDigit w8 -> parseWordLoop (fromIntegral (w8 - 48)) bs'
| otherwise -> error $ "Unexpected byte: " ++ show w8
Nothing -> error "Unexpected end of input"
parseWordLoop !acc bs =
case BS.uncons bs of
Nothing -> (acc, bs)
Just (w8, bs')
| isSpace w8 -> (acc, bs')
| isDigit w8 -> parseWordLoop (acc * 10 + fromIntegral (w8 - 48)) bs'
| otherwise -> error $ "Unexpected byte: " ++ show w8
This is the step where we implement the actual solution. Instead of going into trying to fix the solution provided in this SO question I went on and translated the C++ solution that was linked in the question instead. Reason I went that route is twofold:
Note, that it should be possible to rewrite the solution below with array
package, because in the end all that is needed are the read
, write
and allocate
operations.
computeSortBy ::
(Load r Ix1 e, Manifest r' e)
=> (e -> e -> Ordering)
-> Vector r e
-> Vector r' e
computeSortBy f vec =
withLoadMArrayST_ vec $ quicksortByM_ (\x y -> pure $ f x y) trivialScheduler_
solve :: Matrix P Val -> Maybe (Int, [Color])
solve a = runST $ do
let sz@(Sz2 n m) = size a
ord :: Vector P Int
ord = computeSortBy
(\x y -> compare (a ! (y :. 0)) (a ! (x :. 0))) (0 ..: n)
mxl <- newMArray @P sz minBound
loopM_ (n - 1) (>= 0) (subtract 1) $ \ i ->
loopM_ 0 (< m) (+ 1) $ \j -> do
writeM mxl (i :. j) (a ! ((ord ! i) :. j))
when (i < n - 1) $
writeM mxl (i :. j)
=<< max <$> readM mxl (i :. j) <*> readM mxl (i + 1 :. j)
when (j > 0) $
writeM mxl (i :. j)
=<< max <$> readM mxl (i :. j) <*> readM mxl (i :. j - 1)
mnr <- newMArray @P sz maxBound
loopM_ (n - 1) (>= 0) (subtract 1) $ \ i ->
loopM_ (m - 1) (>= 0) (subtract 1) $ \ j -> do
writeM mnr (i :. j) (a ! ((ord ! i) :. j))
when (i < n - 1) $
writeM mnr (i :. j)
=<< min <$> readM mnr (i :. j) <*> readM mnr (i + 1 :. j)
when (j < m - 1) $
writeM mnr (i :. j)
=<< min <$> readM mnr (i :. j) <*> readM mnr (i :. j + 1)
mnl <- newMArray @P (Sz m) maxBound
mxr <- newMArray @P (Sz m) minBound
let goI i
| i < n - 1 = do
loopM_ 0 (< m) (+ 1) $ \j -> do
val <- min (a ! ((ord ! i) :. j)) <$> readM mnl j
writeM mnl j val
when (j > 0) $
writeM mnl j . min val =<< readM mnl (j - 1)
loopM_ (m - 1) (>= 0) (subtract 1) $ \j -> do
val <- max (a ! ((ord ! i) :. j)) <$> readM mxr j
writeM mxr j val
when (j < m - 1) $
writeM mxr j . max val =<< readM mxr (j + 1)
let goJ j
| j < m - 1 = do
mnlVal <- readM mnl j
mxlVal <- readM mxl (i + 1 :. j)
mxrVal <- readM mxr (j + 1)
mnrVal <- readM mnr ((i + 1) :. (j + 1))
if mnlVal > mxlVal && mxrVal < mnrVal
then pure $ Just (i, j)
else goJ (j + 1)
| otherwise = pure Nothing
goJ 0 >>= \case
Nothing -> goI (i + 1)
Just pair -> pure $ Just pair
| otherwise = pure Nothing
mAns <- goI 0
Control.Monad.forM mAns $ \ (ansFirst, ansSecond) -> do
resVec <- createArrayS_ @BL (Sz n) $ \res ->
iforM_ ord $ \i ordIx -> do
writeM res ordIx $! if i <= ansFirst then R else B
pure (ansSecond + 1, A.toList resVec)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With