image files and python source (version 1 and 2)
Version 1 Here is my first attempt. I will update as I go.
I have got the SO logo down to 300 characters almost lossless. My technique uses conversion to SVG vector art so it works best on line art. It is actually an SVG compressor, it still requires the original art go through a vectorisation stage.
For my first attempt I used an online service for the PNG trace however there are MANY free and non-free tools that can handle this part including potrace (open-source).
Here are the results
Original SO Logo http://www.warriorhut.org/graphics/svg_to_unicode/so-logo.png Original Decoded SO Logo http://www.warriorhut.org/graphics/svg_to_unicode/so-logo-decoded.png After encoding and decoding
Characters: 300
Time: Not measured but practically instant (not including vectorisation/rasterisation steps)
The next stage will be to embed 4 symbols (SVG path points and commands) per unicode character. At the moment my python build does not have wide character support UCS4 which limits my resolution per character. I've also limited the maximum range to the lower end of the unicode reserved range 0xD800 however once I build a list of allowed characters and a filter to avoid them I can theoretically push the required number of characters as low as 70-100 for the logo above.
A limitation of this method at present is the output size is not fixed. It depends on number of vector nodes/points after vectorisation. Automating this limit will require either pixelating the image (which removes the main benefit of vectors) or repeated running the paths through a simplification stage until the desired node count is reached (which I'm currently doing manually in Inkscape).
Version 2
UPDATE: v2 is now qualified to compete. Changes:
Characters: 133
Time: A few seconds
v2 decoded http://www.warriorhut.org/graphics/svg_to_unicode/so-logo-decoded-v2.png After encoding and decoding (version 2)
As you can see there are some artifacts this time. It isn't a limitation of the method but a mistake somewhere in my conversions. The artifacts happen when the points go outside the range 0.0 - 127.0 and my attempts to constrain them have had mixed success. The solution is simply to scale the image down however I had trouble scaling the actual points rather than the artboard or group matrix and I'm too tired now to care. In short, if your points are in the supported range it generally works.
I believe the kink in the middle is due to a handle moving to the other side of a handle it's linked to. Basically the points are too close together in the first place. Running a simplify filter over the source image in advance of compressing it should fix this and shave of some unnecessary characters.
UPDATE: This method is fine for simple objects so I needed a way to simplify complex paths and reduce noise. I used Inkscape for this task. I've had some luck with grooming out unnecessary paths using Inkscape but not had time to try automating it. I've made some sample svgs using the Inkscape 'Simplify' function to reduce the number of paths.
Simplify works ok but it can be slow with this many paths.
autotrace example http://www.warriorhut.org/graphics/svg_to_unicode/autotrace_16_color_manual_reduction.png cornell box http://www.warriorhut.com/graphics/svg_to_unicode/cornell_box_simplified.png lena http://www.warriorhut.com/graphics/svg_to_unicode/lena_std_washed_autotrace.png
thumbnails traced http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_autotrace.png
Here's some ultra low-res shots. These would be closer to the 140 character limit though some clever path compression may be need as well.
groomed http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_groomed.png Simplified and despeckled.
trianglulated http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_triangulated.png Simplified, despeckled and triangulated.
autotrace --output-format svg --output-file cornell_box.svg --despeckle-level 20 --color-count 64 cornell_box.png
ABOVE: Simplified paths using autotrace.
Unfortunately my parser doesn't handle the autotrace output so I don't know how may points are in use or how far to simplify, sadly there's little time for writing it before the deadline. It's much easier to parse than the inkscape output though.
Alright, here's mine: nanocrunch.cpp and the CMakeLists.txt file to build it using CMake. It relies on the Magick++ ImageMagick API for most of its image handling. It also requires the GMP library for bignum arithmetic for its string encoding.
I based my solution off of fractal image compression, with a few unique twists. The basic idea is to take the image, scale down a copy to 50% and look for pieces in various orientations that look similar to non-overlapping blocks in the original image. It takes a very brute force approach to this search, but that just makes it easier to introduce my modifications.
The first modification is that instead of just looking at ninety degree rotations and flips, my program also considers 45 degree orientations. It's one more bit per block, but it helps the image quality immensely.
The other thing is that storing a contrast/brightness adjustment for each of color component of each block is way too expensive. Instead, I store a heavily quantized color (the palette has only 4 * 4 * 4 = 64 colors) that simply gets blended in in some proportion. Mathematically, this is equivalent to a variable brightness and constant contrast adjustment for each color. Unfortunately, it also means there's no negative contrast to flip the colors.
Once it's computed the position, orientation and color for each block, it encodes this into a UTF-8 string. First, it generates a very large bignum to represent the data in the block table and the image size. The approach to this is similar to Sam Hocevar's solution -- kind of a large number with a radix that varies by position.
Then it converts that into a base of whatever the size of the character set available is. By default, it makes full use of the assigned unicode character set, minus the less than, greater than, ampersand, control, combining, and surrogate and private characters. It's not pretty but it works. You can also comment out the default table and select printable 7-bit ASCII (again excluding <, >, and & characters) or CJK Unified Ideographs instead. The table of which character codes are available is stored a run-length encoded with alternating runs of invalid and valid characters.
Anyway, here are some images and times (as measured on my old 3.0GHz P4), and compressed to 140 characters in the full assigned unicode set described above. Overall, I'm fairly pleased with how they all turned out. If I had more time to work on this, I'd probably try to reduce the blockiness of the decompressed images. Still, I think the results are pretty good for the extreme compression ratio. The decompressed images are bit impressionistic, but I find it relatively easy to see how bits correspond to the original.
Stack Overflow Logo (8.6s to encode, 7.9s to decode, 485 bytes):
http://i44.tinypic.com/2w7lok1.png
Lena (32.8s to encode, 13.0s to decode, 477 bytes):
http://i42.tinypic.com/2rr49wg.png http://i40.tinypic.com/2rhxxyu.png
Mona Lisa (43.2s to encode, 14.5s to decode, 490 bytes):
http://i41.tinypic.com/ekgwp3.png http://i43.tinypic.com/ngsxep.png
Edit: CJK Unified Characters
Sam asked in the comments about using this with CJK. Here's a version of the Mona Lisa compressed to 139 characters from the CJK Unified character set:
http://i43.tinypic.com/2yxgdfk.png 咏璘驞凄脒鵚据蛥鸂拗朐朖辿韩瀦魷歪痫栘璯緍脲蕜抱揎頻蓼債鑡嗞靊寞柮嚛嚵籥聚隤慛絖銓馿渫櫰矍昀鰛掾撄粂敽牙稉擎蔍螎葙峬覧絀蹔抆惫冧笻哜搀澐芯譶辍澮垝黟偞媄童竽梀韠镰猳閺狌而羶喙伆杇婣唆鐤諽鷍鴞駫搶毤埙誖萜愿旖鞰萗勹鈱哳垬濅鬒秀瞛洆认気狋異闥籴珵仾氙熜謋繴茴晋髭杍嚖熥勳縿餅珝爸擸萿
The tuning parameters at the top of the program that I used for this were: 19, 19, 4, 4, 3, 10, 11, 1000, 1000. I also commented out the first definition of number_assigned and codes, and uncommented out the last definitions of them to select the CJK Unified character set.
My full solution can be found at http://caca.zoy.org/wiki/img2twit. It has the following features:
http://caca.zoy.org/raw-attachment/wiki/img2twit/so-logo.png http://caca.zoy.org/raw-attachment/wiki/img2twit/twitter4.png
蜥秓鋖筷聝诿缰偺腶漷庯祩皙靊谪獜岨幻寤厎趆脘搇梄踥桻理戂溥欇渹裏軱骿苸髙骟市簶璨粭浧鱉捕弫潮衍蚙瀹岚玧霫鏓蓕戲債鼶襋躻弯袮足庭侅旍凼飙驅據嘛掔倾诗籂阉嶹婻椿糢墤渽緛赐更儅棫武婩縑逡荨璙杯翉珸齸陁颗鳣憫擲舥攩寉鈶兓庭璱篂鰀乾丕耓庁錸努樀肝亖弜喆蝞躐葌熲谎蛪曟暙刍镶媏嘝驌慸盂氤缰殾譑
Here is a rough overview of the encoding process:
And this is the decoding process:
What I believe is the most original part of the program is the bitstream. Instead of packing bit-aligned values (stream <<= shift; stream |= value
), I pack arbitrary values that are not in power-of-two ranges (stream *= range; stream += value
). This requires bignum computations and is of course a lot slower, but it gives me 2009.18 bits instead of 1960 when using the 20902 main CJK characters (that's three more points I can put in the data). And when using ASCII, it gives me 917.64 bits instead of 840.
I decided against a method for the initial image computation that would have required heavy weaponry (corner detection, feature extraction, colour quantisation...) because I wasn't sure at first it would really help. Now I realise convergence is slow (1 minute is acceptable but it's slow nonetheless) and I may try to improve on that.
The main fitting loop is loosely inspired from the Direct Binary Seach dithering algorithm (where pixels are randomly swapped or flipped until a better halftone is obtained). The energy computation is a simple root-mean-square distance, but I perform a 5x5 median filter on the original image first. A Gaussian blur would probably better represent the human eye behaviour, but I didn't want to lose sharp edges. I also decided against simulated annealing or other difficult to tune methods because I don't have months to calibrate the process. Thus the "quality" flag just represents the number of iterations that are performed on each point before the encoder ends.
http://caca.zoy.org/raw-attachment/wiki/img2twit/Mona_Lisa_scaled.jpg http://caca.zoy.org/raw-attachment/wiki/img2twit/twitter2.png
苉憗揣嶕繠剳腏篮濕茝霮墧蒆棌杚蓳縳樟赒肴飗噹砃燋任朓峂釰靂陴貜犟掝喗讄荛砙矺敨鷾瓔亨髎芟氲簵鸬嫤鉸俇激躙憮鄴甮槺骳佛愚猪駪惾嫥綖珏矯坼堭颽箽赭飉訥偁箝窂蹻熛漧衆橼愀航玴毡裋頢羔恺墎嬔鑹楄瑥鶼呍蕖抲鸝秓苾绒酯嵞脔婺污囉酼俵菛琪棺则辩曚鸸職銛蒝礭鱚蟺稿纡醾陴鳣尥蟀惘鋁髚忩祤脤养趯沅况
Even though not all images compress well, I'm surprised by the results and I really wonder what other methods exist that can compress an image to 250 bytes.
I also have small movies of the encoder state's evolution from a random initial state and from a "good" initial state.
Edit: here is how the compression method compares with JPEG. On the left, jamoes's above 536-byte picture. On the right, Mona Lisa compressed down to 534 bytes using the method described here (the bytes mentioned here refer to data bytes, therefore ignoring bits wasted by using Unicode characters):
http://caca.zoy.org/raw-attachment/wiki/img2twit/minimona.jpg http://caca.zoy.org/raw-attachment/wiki/img2twit/minimona2.png
Edit: just replaced CJK text with the newest versions of the images.
The following isn't a formal submission, since my software hasn't been tailored in any way for the indicated task. DLI can be described as an optimizing general purpose lossy image codec. It's the PSNR and MS-SSIM record holder for image compression, and I thought it would be interesting to see how it performs for this particular task. I used the reference Mona Lisa image provided and scaled it down to 100x150 then used DLI to compress it to 344 bytes.
Mona Lisa DLI http://i40.tinypic.com/2md5q4m.png
For comparison with the JPEG and IMG2TWIT compressed samples, I used DLI to compress the image to 534 bytes as well. The JPEG is 536 bytes and IMG2TWIT is 534 bytes. Images have been scaled up to approximately the same size for easy comparison. JPEG is the left image, IMG2TWIT is center, and DLI is the right image.
Comparison http://i42.tinypic.com/302yjdg.png
The DLI image manages to preserve some of the facial features, most notably the famous smile :).
The general overview of my solution would be:
I know that you were asking for code, but I don't really want to spend the time to actually code this up. I figured that an efficient design might at least inspire someone else to code this up.
I think the major benefit of my proposed solution is that it is reusing as much existing technology as possible. It may be fun to try to write a good compression algorithm, but there is guaranteed to be a better algorithm out there, most likely written by people who have a degree in higher math.
One other important note though is that if it is decided that utf16 is the preferred encoding, then this solution falls apart. jpegs don't really work when compressed down to 280 bytes. Although, maybe there is a better compression algorithm than jpg for this specific problem statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With