Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the source code for LZ4 compression 64-bit compatible?

I have just downloaded sources for LZ4-HC compression and checking it for 64-bit compatibility.

I'am getting few warnings "conversion from '__int64' to 'unsigned int', possible loss of data"

When I kept digging I noticed macro ADD_HASH(p). Last part of that macro is

HashTable[HASH_VALUE(p)] = (p) - base;

p - const BYTE*
base - const BYTE* const for 64-bit.   (const int b - for 32-bit)
HTYPE HashTable[];
HTYPE is U32 for 64-bit platform       (const BYTE* - for 32-bit)

What is happening on 32 bit - we subtract const int from pointer and storing into another pointer - safe enough.

Now 64: It looks to me that substracting two pointers on 64 and saving them into U32 is not safe at all!

My understanding that LZ4 is 64-bit compatible only if guaranteed that "p" and "base" are not far apart... and now I have to dig deeper into the logic to check that.

Did I miss anything? Did anybody check this library for full 64-bit compatibility as it claimed to be? Any other know issues with library's code?

like image 511
adspx5 Avatar asked Nov 04 '22 13:11

adspx5


1 Answers

LZ4 is supposed to be 64-bits compatible. It has been tested numerous times already.

LZ4-HC is a bit more complex, and maybe there are some compiler warnings left. Feel free to notify them on the issue list : http://code.google.com/p/lz4/issues/list

The substraction of 2 pointers is supposed to be a size_t type. size_t is 64 bits on 64 bits CPU. Casting the result to 32 bits may therefore create an overflow issue.

This is however unlikely. LZ4 works on 64 KB window. Which means, any reference beyond 64KB is disregarded. For a very long range reference to become issue, it would need to be exactly 4GB + few KB. Moreoever, since references are listed, it is necessary that there is absolutely zero reference between < 64KB and > 4GB using the same hash. This is also extremely unlikely.

Even then, if such a case could be intentionnally forged, the end effect is that the compressor will be "hinted" towards a position which is not a match. And will discard it at compare operation.

So the only downside is a risk to lose a few CPU cycles on a useless comparison. Quite fair.

Nonetheless, it's always better to remove "compiler warnings" whenever it is "almost free". Almost free being translated into : no loss of performance, and negligible impact on code complexity.

like image 169
Cyan Avatar answered Nov 15 '22 01:11

Cyan