How to implement a run length encoding modulus n>=1
? For n=4
, considering the inputAAABBBBABCCCCBBBDAAA
, we want an output of [('D', 1), ('A', 3)]
. Note the long-distance merging due to the modulus operation. See Explanation.
The first occurance of BBBB
encodes to (B, 4)
whose modulus 4
is (B, 0)
, thus canceling itself out. See the diagram (ignore spaces; they are simply for illustrative purposes):
AAABBBBABCCCCBBBDAAA
A3 B4 ABCCCCBBBDAAA
A3 B0 ABCCCCBBBDAAA
A3 ABCCCCBBBDAAA
A4 BCCCCBBBDAAA
A0 BCCCCBBBDAAA
BCCCCBBBDAAA
...
DA3
A simpler example when no merging happens since none gets canceled by modulus 4
: input AAABABBBC
produces output [('A',3),('B',1),('A',1),('B',3),('C',1)]
.
I implemented this in Haskell, but it looks too verbose and awful to read. The key idea is to check three tuples at a time, and only advance one tuple forward if we can neither cancel out 0
tuples nor merge a pair of tuples among the three tuples at hand.
import Data.List (group)
test = [('A', 1), ('A', 2), ('B', 2), ('B', 2), ('A', 1), ('B', 1), ('C', 1), ('C', 3), ('B', 3), ('D', 1), ('A', 3)] :: [(Char, Int)]
expected = [('D', 1), ('A', 3)] :: [(Char, Int)]
reduce' :: [(Char, Int)] -> [(Char, Int)]
reduce' [ ] = [] -- exit
reduce' ( (_,0):xs) = reduce' xs
reduce' (x1:(_,0):xs) = reduce' (x1:xs)
reduce' ( (x,n):[]) = (x,n):[] -- exit
reduce' ( (x1,n1):(x2,n2):[]) -- [previous,current,NONE]
| x1 == x2 = reduce' ((x1, d4 (n1+n2)):[])
| otherwise = (x1,n1):( -- advance
reduce' ((x2, d4 n2 ):[]))
reduce' ((x1,n1):(x2,n2):(x3,n3):xs) -- [previous,current,next]
| n3 == 0 = reduce' ((x1, d4 n1 ):(x2, d4 n2 ):xs)
| n2 == 0 = reduce' ((x1, d4 n1 ):(x3, d4 n3 ):xs)
| x2 == x3 = reduce' ((x1, d4 n1 ):(x2, d4 (n2+n3)):xs)
| x1 == x2 = reduce' ((x2, d4 (n1+n2)):(x3, d4 n3 ):xs)
| otherwise = (x1,n1):( -- advance
reduce' ((x2, d4 n2 ):(x3, d4 n3 ):xs)
)
-- Helpers
flatten :: [(Char, Int)] -> String
flatten nested = concat $ (\(x, n) -> replicate n x) <$> nested
nest :: String -> [(Char, Int)]
nest flat = zip (head <$> xg) (d4 .length <$> xg)
where xg = group flat
reduce = reduce' . nest
d4 = (`rem` 4)
My inputs are like the test
variable in the snipped above. We could keep doing flatten
then nest
until its result doesn't change, and would definitely look simpler. But it feels it is scanning the whole list many times, while my 3-pointer implementation scans the whole list only once. Maybe we can pop an element from left and add it to a new stack while merging identical consecutive items? Or maybe use Applicative Functors? E.g. this works but not sure about its efficiency/performance: reduce = (until =<< ((==) =<<)) (nest . flatten)
.
Run-length encoding (RLE) is a lossless compression method where sequences that display redundant data are stored as a single data value representing the repeated block and how many times it appears in the image. Later, during decompression, the image can be reconstructed exactly from this information.
In 1983, run-length encoding was patented by Hitachi. RLE is particularly well suited to palette-based bitmap images such as computer icons, and was a popular image compression method on early online services such as CompuServe before the advent of more sophisticated formats such as GIF.
Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs.
Run-Length Encoding (RLE) Encoding this with a 3-bit count and the 1 bit value, the encoding is 0-110 1-111 1-100 0-111 The compression ratio is (24 - 16) / 24 = 1/3. RLE is lossless. RLE is good for compressing images with large uniform areas (scanned text: 8-to-1 compression).
I think you are making this problem much harder by thinking of it in terms of character strings at all. Instead, do a preliminary pass that just does the boring RLE part. This way, a second pass is comparatively easy, because you can work in "tokens" that represent a run of a certain length, instead of having to work one character at a time.
The only data structure we need to maintain as we do the second pass through the list is a stack, and we only ever need to look at its top element. We compare each token that we're examining with the top of the stack. If they're the same, we blend them into a single token representing their concatenation; otherwise, we simply push the next token onto the stack. In either case, we reduce token sizes mod N and drop tokens with size 0.
CCCBBBAAA...
.Writing down my remark about laziness made me think of your reduce
solution, which appears to produce output lazily, which I thought was impossible. The explanation, it turns out, is that your implementation is not just inelegant, as you say, but also incorrect. It produces output too soon, missing chances to cancel with later elements. The simplest test case I can find that you fail is reduce "ABABBBBAAABBBAAA" == [('A',1),('A',3)]
. We can confirm that this is due to yielding results too early, by noting that take 1 $ reduce ("ABAB" ++ undefined)
yields [(1, 'A')]
even though elements might come later that cancel with that first A.
Finally note that I use a custom data type Run
just to give a name to the concept; of course you can convert this to a tuple cheaply, or rewrite the function to use tuples internally if you prefer.
import Data.List (group)
data Run a = Run Int a deriving Show
modularRLE :: Eq a => Int -> [a] -> [Run a]
modularRLE groupSize = go [] . tokenize
where go stack [] = reverse stack
go stack (Run n x : remainder) = case stack of
[] -> go (blend n []) remainder
(Run m y : prev) | x == y -> go (blend (n + m) prev) remainder
| otherwise -> go (blend n stack) remainder
where blend i s = case i `mod` groupSize of
0 -> s
j -> Run j x : s
tokenize xs = [Run (length run) x | run@(x:_) <- group xs]
λ> modularRLE 4 "AAABBBBABCCCCBBBDAAA"
[Run 1 'D',Run 3 'A']
λ> modularRLE 4 "ABABBBBAAABBBAAA"
[]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With