Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String encoding of primitive types preserving lexicographic order

Does anyone know of a library for encoding a number of primitive types (like integers, floats, strings, etc) into a string but preserving the lexicographical order of the types?

Ideally, I'm looking for a C++ library, but other languages are fine too. Also, one can assume that the format does not need to be encoded in the string itself (that is, if it's int64/string/float then the encoded string does not need to encode this information, only encoding the data is enough).

like image 298
nilton Avatar asked Nov 19 '09 03:11

nilton


2 Answers

Take a look at this paper ("Efficient Lexicographic Encoding of Numbers") which shows how to represent any numeric type as a string such the lexicographic order of the strings is the same as the numerical order of the underlying numbers. It copes with arbitrary length numbers.

http://www.zanopha.com/docs/elen.pdf

like image 114
Peter Avatar answered Oct 14 '22 10:10

Peter


I had the problem of converting integers and longs to strings which preserve ordering. And since I was working in Java, I only had signed types.

My algorithm was very simple:

  1. Flip the sign bit (toEncode ^ Long.MAX_VALUE for longs) otherwise negative numbers are greater than positive numbers.
  2. Do a modified base64 encoding of the bytes. Unfortunately, the normal base64 encoding doesn't preserve ordering; the special characters (+ and /) are after the numbers which are after the characters. This is completely backwards from ASCII. My modified encoding simply uses the ASCII ordering. (To make it clear it wasn't normal base64, I changed the special chars to - and _ with ~ as the padding. These are still useable within an URL, which was another a constraint I had.)
like image 25
Michael Deardeuff Avatar answered Oct 14 '22 10:10

Michael Deardeuff