Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I compress a sequence of integers?

I have an array which contains data within range -255 to +255.e.g. The array can be like this:

  int data[]={234,56,-4,24,56,78,23,89,234,68,-12,-253,45,128};

Here, order must be preserved while decompressing e.g. after 1st term 234, 56 must come.

So, what are the ways to compress any arbitrary sequence of numbers for which any repeating pattern can't be observed?

like image 995
Debadyuti Maiti Avatar asked Sep 01 '12 11:09

Debadyuti Maiti


People also ask

Can numbers be compressed?

Well, yes, a big number can be expressed mathematically, and potentially save some space doing so. So if you convert each character to its ASCII value, then every character is expanded from a single byte to 1, 2, or 3 bytes. That is, 'A' becomes '65'. 'z' becomes '122'.

Can you compress binary?

When you use COMPRESS=BINARY, patterns of multiple characters across the entire observation are compressed. Binary compression uses two techniques at the same time. This option searches for the following: Repeating byte sequences (for example, 10 blank spaces or 10 zero bytes in a row)

How do you compress data?

Data compression is a process in which the size of a file is reduced by re-encoding the file data to use fewer bits of storage than the original file. A fundamental component of data compression is that the original file can be transferred or stored, recreated, and then used later (with a process called decompression).


3 Answers

A range of -255 to 255 means 511 values -> 9 bits. If you take the sign separately, 1 bit for sign and a byte for value.

You can write your array as a byte array, each byte value will be the absolute value of the related int.

In a separate zone (a long, or perhaps a byte array), store the sign bit.

like image 134
SJuan76 Avatar answered Nov 14 '22 07:11

SJuan76


If there are truly no patterns in the data then a useful compression algorithm is impossible. Don't even bother trying!

Of course, in this case because all the numbers are in a restricted range n then you do have a pattern in the bits - namely that your high bits are either all 0 (positive) or all 1 (negative).

Standard compression algorithms like zip would therefore work if you want to compress reasonably effectively (assuming you have a long enough array of numbers to make it worthwhile).

Alternatively you can exploit the fact that you are effectively using 9-bit numbers. So you could roll your own compression algorithm by laying out the numbers as a long stream of 9-bit chunks and putting this into a byte array.

like image 26
mikera Avatar answered Nov 14 '22 08:11

mikera


In your situation (when repeating pattern can't be observed), variable-length coding may help you.

For example, Elias gamma-coding and Exponential-Golomb coding. The general idea - is that small numbers needs only few bits to be encoded. Exp-Golomb coding is used in the H.264/MPEG-4 AVC video compression standard. It is very easy to encode and decode sequences with it, also it is not very hard to implement this coding.

The way to code all integers is to set up a bijection, mapping integers (0, 1, -1, 2, -2, 3, -3, ...) to (1, 2, 3, 4, 5, 6, 7, ...) before coding.

For example:

Sequence (after bijection) [ 0, 2, 5, 8, 5, 2 ] would be encoded as 101100110000100100110011 - As you may see - there is no repeating patterns in this sequence, but it encoded only with 24 bits.

Short description of decoding process:

  1. Read from input stream and count leading zero-bits (until you find non-zero bit) -> zero_bits_count

  2. Read from input stream next ( zero_bits_count + 1 ) bits -> binary

  3. Convert binary to decimal

  4. Return ( decimal - 1 )

1... -> no leading zeros, zero_bits_count = 0 -> read next 1 bit -> [1]... -> [1] is 1 -> 1 - 1 = 0

011... -> [0] - one leading zero, zero_bits_count = 1 -> read next 2 bits -> [11]... -> [11] is 3 -> 3 - 1 = 2

00110... -> [00] - two leading zeros, zero_bits_count = 2 -> read next 3 bits -> [110]... -> [110] is 6 -> 6 - 1 = 5

etc.

like image 27
stemm Avatar answered Nov 14 '22 09:11

stemm