It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:
1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?
If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.
Most forms of lossy compression are based on transform coding, especially the discrete cosine transform (DCT).
The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.
You could start by looking at Huffman Encoding. A lot of computer science classes implement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project available in Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).
To answer your questions:
My opinion will be, first decide whether you want to do a lossless compression
or a lossy compression
, then pick an algorithm to implement. Here are a few pointers:
For the lossless one, some are very intuitive, such as the run-length
encoding,
e.g., if there is 11 a
s and 5 b
s, you just encode them as 11a5b
.
Some algorithms use a dictionary
, please refer to LZW encoding
.
Finally, I do recommend Huffman
encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).
For lossy ones, Discrete Fourier Transform (DFT)
, or wavelet
, is used in JPEG compression. This is useful to understand multimedia compression.
Wikipedia page is a good starting point.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With