It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions: 1) Would c be a suitable programming language to accomplish this task? 2) Should I be working in byte's with the input file? Or at a binary level somehow? If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.

To answer your questions: <ol> <li>C is suitable.</li> <li>It depends on the algorithm, or the way you are thinking about `compression'.</li> </ol> My opinion will be, first decide whether you want to do a <code>lossless compression</code> or a <code>lossy compression</code>, then pick an algorithm to implement. Here are a few pointers: For the lossless one, some are very intuitive, such as the <code>run-length</code> encoding, e.g., if there is 11 <code>a</code>s and 5 <code>b</code>s, you just encode them as <code>11a5b</code>. Some algorithms use a <code>dictionary</code>, please refer to <code>LZW encoding</code>. Finally, I do recommend <code>Huffman</code> encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose). For lossy ones, <code>Discrete Fourier Transform (DFT)</code>, or <code>wavelet</code>, is used in JPEG compression. This is useful to understand multimedia compression. Wikipedia page is a good starting point.

Programming novice: How to program my own data compression algorithm?

Tags:

c

algorithm

compression

It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:

1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?

If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.

650

asked May 24 '11 17:05

araisbec

2 Answers

You could start by looking at Huffman Encoding. A lot of computer science classes implement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project available in Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).

145

answered Sep 28 '22 20:09

Brian Lyttle

To answer your questions:

C is suitable.
It depends on the algorithm, or the way you are thinking about `compression'.

My opinion will be, first decide whether you want to do a lossless compression or a lossy compression, then pick an algorithm to implement. Here are a few pointers:

For the lossless one, some are very intuitive, such as the run-length encoding, e.g., if there is 11 as and 5 bs, you just encode them as 11a5b. Some algorithms use a dictionary, please refer to LZW encoding. Finally, I do recommend Huffman encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).

For lossy ones, Discrete Fourier Transform (DFT), or wavelet, is used in JPEG compression. This is useful to understand multimedia compression.

Wikipedia page is a good starting point.

answered Sep 28 '22 18:09

Ivan Xiao

Related questions
                            
                                How do I convert a Win32 exception code to a string?
                            
                                Why cast is needed in printf?
                            
                                Including standard header files. string.h or cstring? or both? [duplicate]
                            
                                Does inclusion of {} matter in C string initialization?
                            
                                How to write on a virtual webcam in Linux?
                            
                                Initializing array of integer pointer in C
                            
                                How to safely convert/copy volatile variable?
                            
                                How hard is it (really) to decompile assembly code? [closed]
                            
                                Order of defining types
                            
                                signed as default in C
                            
                                Difference between ADT and Classes?
                            
                                Is there an optimal byte size for sending data over a network?
                            
                                How do I unroll (compile) an interpreter loop?
                            
                                Good patterns for a C/C++ plugin-based system?
                            
                                Representing dynamic typing in C
                            
                                How to make YY_INPUT point to a string rather than stdin in Lex & Yacc (Solaris)
                            
                                Is fwrite atomic?
                            
                                How are delegates in C# better than function pointers in C/C++?
                            
                                Project Euler Question 14 (Collatz Problem)
                            
                                Can we optimize code to reduce power consumption?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With