I have a low-resource embedded system with a graphical user interface. The interface requires font data. To conserve read-only memory (flash), the font data needs to be compressed. I am looking for an algorithm for this purpose. Properties of the data to be compressed <ul> <li>transparency data for a rectangular pixel map with 8 bits per pixel</li> <li>there are typically around 200..300 glyphs in a font (typeface sampled in certain size)</li> <li>each glyph is typically from 6x9 to 15x20 pixels in size</li> <li>there are a lot of zeros ("no ink") and somewhat less 255's ("completely inked"), otherwise the distribution of octets is quite even due to the nature of anti-aliasing</li> </ul> Requirements for the compression algorithm <ul> <li>The important metrics for the decompression algorithm is the size of the data plus the size of the algorithm (as they will reside in the same limited memory).</li> <li>There is very little RAM available for the decompression; it is possible to decompress the data for a single glyph into RAM but not much more.</li> <li>To make things more difficult, the algorithm has to be very fast on a 32-bit microcontroller (ARM Cortex-M core), as the glyphs need to be decompressed while they are being drawn onto the display. Ten or twenty machine cycles per octet is ok, a hundred is certainly too much.</li> <li>To make things easier, the complete corpus of data is known a priori, and there is a lot of processing power and memory available during the compression phase.</li> </ul> Conclusions and thoughts <ul> <li>The naïve approach of just packing each octet by some variable-length encoding does not give good results due to the relatively high entropy.</li> <li>Any algorithm taking advantage of data decompressed earlier seems to be out of question as it is not possible to store the decompressed data of other glyphs. This makes LZ algorithms less efficient as they can only reference to a small amount of data.</li> <li>Constraints on the processing power seem to rule out most bitwise operations, i.e. decompression should handle the data octet-by-octet. This makes Huffman coding difficult and arithmetic coding impossible.</li> <li>The problem seems to be a good candidate for static dictionary coding, as all data is known beforehand, and the data is somewhat repetitive in nature (different glyphs share same shapes).</li> </ul> Questions <ul> <li>How can a good dictionary be constructed? I know finding the optimal dictionary for certain data is a np complete problem, but are there any reasonably good approximations? I have tried the zstandard's dictionary builder, but the results were not very good.</li> <li>Is there something in my conclusions that I've gotten wrong? (Am I on the wrong track and omitting something obvious?)</li> </ul> Best algorithm this far Just to give some background information, the best useful algorithm I have been able to figure out is as follows: <ul> <li>All samples in the font data for a single glyph are concatenated (flattened) into a one-dimensional array (vector, table).</li> <li>Each sample has three possible states: 0, 255, and "something else".</li> <li>This information is packed five consecutive samples at a time into a 5-digit base-three number (0..3^5).</li> <li>As there are some extra values available in an octet (2^8 = 256, 3^5 = 243), they are used to signify longer strings of 0's and 255's.</li> <li>For each "something else" value the actual value (1..254) is stored in a separate vector.</li> </ul> This data is fast to decompress, as the base-3 values can be decoded into base-4 values by a smallish (243 x 3 = 729 octets) lookup table. The compression ratios are highly dependent on the font size, but with my typical data I can get around 1:2. As this is significantly worse than LZ variants (which get around 1:3), I would like to try the static dictionary approach. Of course, the usual LZ variants use Huffman or arithmetic coding, which naturally makes the compressed data smaller. On the other hand, I have all the data available, and the compression speed is not an issue. This should make it possible to find much better dictionaries. Due to the nature of the data I could be able to use a lossy algorithm, but in that case the most likely lossy algorithm would be reducing the number of quantization levels in the pixel data. That won't change the underlying compression problem much, and I would like to avoid the resulting bit-alignment hassle.

You can consider using something already developed for a scenario similar to Yours https://github.com/atomicobject/heatshrink https://spin.atomicobject.com/2013/03/14/heatshrink-embedded-data-compression/

Lightweight (de)compression algorithm for embedded use

Tags:

algorithm

embedded

compression

I have a low-resource embedded system with a graphical user interface. The interface requires font data. To conserve read-only memory (flash), the font data needs to be compressed. I am looking for an algorithm for this purpose.

Properties of the data to be compressed

transparency data for a rectangular pixel map with 8 bits per pixel
there are typically around 200..300 glyphs in a font (typeface sampled in certain size)
each glyph is typically from 6x9 to 15x20 pixels in size
there are a lot of zeros ("no ink") and somewhat less 255's ("completely inked"), otherwise the distribution of octets is quite even due to the nature of anti-aliasing

Requirements for the compression algorithm

The important metrics for the decompression algorithm is the size of the data plus the size of the algorithm (as they will reside in the same limited memory).
There is very little RAM available for the decompression; it is possible to decompress the data for a single glyph into RAM but not much more.
To make things more difficult, the algorithm has to be very fast on a 32-bit microcontroller (ARM Cortex-M core), as the glyphs need to be decompressed while they are being drawn onto the display. Ten or twenty machine cycles per octet is ok, a hundred is certainly too much.
To make things easier, the complete corpus of data is known a priori, and there is a lot of processing power and memory available during the compression phase.

Conclusions and thoughts

The naïve approach of just packing each octet by some variable-length encoding does not give good results due to the relatively high entropy.
Any algorithm taking advantage of data decompressed earlier seems to be out of question as it is not possible to store the decompressed data of other glyphs. This makes LZ algorithms less efficient as they can only reference to a small amount of data.
Constraints on the processing power seem to rule out most bitwise operations, i.e. decompression should handle the data octet-by-octet. This makes Huffman coding difficult and arithmetic coding impossible.
The problem seems to be a good candidate for static dictionary coding, as all data is known beforehand, and the data is somewhat repetitive in nature (different glyphs share same shapes).

Questions

How can a good dictionary be constructed? I know finding the optimal dictionary for certain data is a np complete problem, but are there any reasonably good approximations? I have tried the zstandard's dictionary builder, but the results were not very good.
Is there something in my conclusions that I've gotten wrong? (Am I on the wrong track and omitting something obvious?)

Best algorithm this far

Just to give some background information, the best useful algorithm I have been able to figure out is as follows:

All samples in the font data for a single glyph are concatenated (flattened) into a one-dimensional array (vector, table).
Each sample has three possible states: 0, 255, and "something else".
This information is packed five consecutive samples at a time into a 5-digit base-three number (0..3^5).
As there are some extra values available in an octet (2^8 = 256, 3^5 = 243), they are used to signify longer strings of 0's and 255's.
For each "something else" value the actual value (1..254) is stored in a separate vector.

This data is fast to decompress, as the base-3 values can be decoded into base-4 values by a smallish (243 x 3 = 729 octets) lookup table. The compression ratios are highly dependent on the font size, but with my typical data I can get around 1:2. As this is significantly worse than LZ variants (which get around 1:3), I would like to try the static dictionary approach.

Of course, the usual LZ variants use Huffman or arithmetic coding, which naturally makes the compressed data smaller. On the other hand, I have all the data available, and the compression speed is not an issue. This should make it possible to find much better dictionaries.

Due to the nature of the data I could be able to use a lossy algorithm, but in that case the most likely lossy algorithm would be reducing the number of quantization levels in the pixel data. That won't change the underlying compression problem much, and I would like to avoid the resulting bit-alignment hassle.

672

asked Aug 26 '17 22:08

DrV

1 Answers

You can consider using something already developed for a scenario similar to Yours

https://github.com/atomicobject/heatshrink

https://spin.atomicobject.com/2013/03/14/heatshrink-embedded-data-compression/

158

answered Nov 04 '22 11:11

Marcin Gałczyński

Related questions
                            
                                Time Complexity of finding a basin
                            
                                Binary search modification on unknown shift
                            
                                How to detect colors under different illumination conditions
                            
                                Algorithmically identifying pure functions in javascript
                            
                                Can you do addition/multiplication with Big O notations?
                            
                                Grouping set of points to nearest pairs
                            
                                Find Shortest Path in a Maze with Recursive Algorithm
                            
                                Using 10 Threads to Process an Array
                            
                                Number of ways of correctly arranging parenthesis
                            
                                Logarithmic Scale - Ranking in PHP
                            
                                How to check if bytes array contains another array
                            
                                Relative cardinal direction of two coordinates
                            
                                How to compute the union polygon of two (or more) rectangles
                            
                                Give a tight big-oh run-time analysis for this code fragment
                            
                                What kind of maze Solving Algorithm is this?
                            
                                Algorithm to generate mountain ranges with upstrokes and down-strokes (java)
                            
                                Efficient algorithm for ordering different types of objects
                            
                                Lookup in a map of Integer ranges in Golang
                            
                                What string distance algorithm is best for measuring typing accuracy?
                            
                                What is the best algorithm to solve this puzzle?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With