I have an array which contains data within range -255 to +255.e.g. The array can be like this:
int data[]={234,56,-4,24,56,78,23,89,234,68,-12,-253,45,128};
Here, order must be preserved while decompressing e.g. after 1st term 234, 56 must come.
So, what are the ways to compress any arbitrary sequence of numbers for which any repeating pattern can't be observed?
Well, yes, a big number can be expressed mathematically, and potentially save some space doing so. So if you convert each character to its ASCII value, then every character is expanded from a single byte to 1, 2, or 3 bytes. That is, 'A' becomes '65'. 'z' becomes '122'.
When you use COMPRESS=BINARY, patterns of multiple characters across the entire observation are compressed. Binary compression uses two techniques at the same time. This option searches for the following: Repeating byte sequences (for example, 10 blank spaces or 10 zero bytes in a row)
Data compression is a process in which the size of a file is reduced by re-encoding the file data to use fewer bits of storage than the original file. A fundamental component of data compression is that the original file can be transferred or stored, recreated, and then used later (with a process called decompression).
A range of -255 to 255 means 511 values -> 9 bits. If you take the sign separately, 1 bit for sign and a byte for value.
You can write your array as a byte array, each byte value will be the absolute value of the related int.
In a separate zone (a long, or perhaps a byte array), store the sign bit.
If there are truly no patterns in the data then a useful compression algorithm is impossible. Don't even bother trying!
Of course, in this case because all the numbers are in a restricted range n then you do have a pattern in the bits - namely that your high bits are either all 0 (positive) or all 1 (negative).
Standard compression algorithms like zip would therefore work if you want to compress reasonably effectively (assuming you have a long enough array of numbers to make it worthwhile).
Alternatively you can exploit the fact that you are effectively using 9-bit numbers. So you could roll your own compression algorithm by laying out the numbers as a long stream of 9-bit chunks and putting this into a byte array.
In your situation (when repeating pattern can't be observed), variable-length coding may help you.
For example, Elias gamma-coding and Exponential-Golomb coding. The general idea - is that small numbers needs only few bits to be encoded. Exp-Golomb coding is used in the H.264/MPEG-4 AVC video compression standard. It is very easy to encode and decode sequences with it, also it is not very hard to implement this coding.
The way to code all integers is to set up a bijection, mapping integers (0, 1, -1, 2, -2, 3, -3, ...) to (1, 2, 3, 4, 5, 6, 7, ...) before coding.
For example:
Sequence (after bijection) [ 0, 2, 5, 8, 5, 2 ]
would be encoded as 101100110000100100110011
- As you may see - there is no repeating patterns in this sequence, but it encoded only with 24 bits.
Short description of decoding process:
Read from input stream and count leading zero-bits (until you find non-zero bit) -> zero_bits_count
Read from input stream next ( zero_bits_count + 1 ) bits -> binary
Convert binary to decimal
Return ( decimal - 1 )
1... -> no leading zeros, zero_bits_count = 0 -> read next 1 bit -> [1]... -> [1] is 1 -> 1 - 1 = 0
011... -> [0] - one leading zero, zero_bits_count = 1 -> read next 2 bits -> [11]... -> [11] is 3 -> 3 - 1 = 2
00110... -> [00] - two leading zeros, zero_bits_count = 2 -> read next 3 bits -> [110]... -> [110] is 6 -> 6 - 1 = 5
etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With