Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I cast an long NUMERIC integer into a bit string in PostgreSQL?

I'm trying to calculate hamming distance for pairs of long integers (20 digits each) in a Django app using the pg_similarity extension for Postgres, and am having a hard time figuring out how to do this. Django does not seem to have a current BitString field (which would be ideal, but django_postgres seems to be defunct), so I was trying to just cast the integers into bitstrings in the SQL query itself. My current query:

    sql = ''' SELECT id, hamming(
        "HashString"::BIT(255),
        %s::BIT(255)
    ) as hamming_distance
    FROM images
    HAVING hamming_distance < %s
    ORDER BY hamming_distance;'''

is throwing a DB error: cannot cast type numeric to bit. What am I doing wrong? What else could I try?

like image 946
rafiki Avatar asked Oct 09 '14 19:10

rafiki


2 Answers

Per the manual, casting is the correct approach if your "long integer" is actually a "long integer" i.e. bigint / int8:

regress=> SELECT ('1324'::bigint)::bit(64);
                               bit                                
------------------------------------------------------------------
 0000000000000000000000000000000000000000000000000000010100101100
(1 row)

but (edit) you're actually asking how to cast an integer-only numeric to bit. Not so simple, hold on.

You can't bitshift numeric either, so you can't easily bitshift it into 64-bit chunks, convert, and reassemble.

You'll have to use division and modulus instead.

Given:

SELECT '1792913810350008736973055638379610855835'::numeric(40,0);

you can get it in 'bigint' chunks that, when multiplied by max-long (9223372036854775807) times their place value produce the original value.

e.g. this gets the lowest 64-bits:

SELECT ('1792913810350008736973055638379610855835'::numeric(40,0) / '9223372036854775807'::numeric(256,0)) % '9223372036854775807'::numeric(40,0);

and this gets all the chunks for a given value of up to 256 digits and their exponents

WITH numval(v) AS (VALUES ('1792913810350008736973055638379610855835'::numeric(40,0)))
SELECT exponent, floor(v / ('9223372036854775807'::numeric(256,0) ^ exponent) % '9223372036854775807'::numeric(40,0)) from numval, generate_series(1,3) exponent;

You can reassemble this into the original value:

WITH
  numval(v) AS (
    VALUES ('1792913810350008736973055638379610855835'::numeric(40,0))
  ),
  chunks (exponent, chunk) AS (
     SELECT exponent, floor(v / ('9223372036854775807'::numeric(40,0) ^ exponent) % '9223372036854775807'::numeric(40,0))::bigint from numval, generate_series(1,3) exponent
  )
SELECT floor(sum(chunk::numeric(40,0) * ('9223372036854775807'::numeric(40,0) ^ exponent))) FROM chunks;

so we know it's decomposed correctly.

Now we're working with a series of 64-bit integers, we can convert each into a bitfield. Because we're using signed integers, each only has 63 significant bits, so:

WITH
  numval(v) AS (
    VALUES ('1792913810350008736973055638379610855835'::numeric(40,0))
  ),
  chunks (exponent, chunk) AS (
     SELECT exponent, floor(v / ('9223372036854775807'::numeric(40,0) ^ exponent) % '9223372036854775807'::numeric(40,0))::bigint from numval, generate_series(1,3) exponent
  )
SELECT
  exponent,
  chunk::bit(63)
FROM chunks;

gives us the bit values for each 63-bit chunk. We can then reassemble them. There's no bitfield concatenation operator, but we can shift and bit_or, then wrap it into an SQL function, producing the monstrosity:

CREATE OR REPLACE FUNCTION numericint40_to_bit189(numeric(40,0)) RETURNS bit(189)
LANGUAGE sql
AS
$$
    WITH
      chunks (exponent, chunk) AS (
         SELECT exponent, floor($1 / ('9223372036854775807'::numeric(40,0) ^ exponent) % '9223372036854775807'::numeric(40,0))::bigint 
         FROM generate_series(1,3) exponent
      )
    SELECT
      bit_or(chunk::bit(189) << (63*(exponent-1)))
    FROM chunks;
$$;

which can be seen in use here:

regress=> SELECT numericint40_to_bit189('1792913810350008736973055638379610855835');
                                                                                    numericint40_to_bit189                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010101000100110101101010001110110110101001111100011100011110000010110
(1 row)
like image 66
Craig Ringer Avatar answered Sep 25 '22 16:09

Craig Ringer


Thanks for the initial answer Craig Ringer! Here is a correct version of the function. It supports up to 300 bits and can be expanded if needed.

CREATE OR REPLACE FUNCTION numeric_to_bit(NUMERIC)
  RETURNS BIT VARYING AS $$
DECLARE
  num ALIAS FOR $1;
  -- 1 + largest positive BIGINT --
  max_bigint NUMERIC := '9223372036854775808' :: NUMERIC(19, 0);
  result BIT VARYING;
BEGIN
  WITH
      chunks (exponent, chunk) AS (
        SELECT
          exponent,
          floor((num / (max_bigint ^ exponent) :: NUMERIC(300, 20)) % max_bigint) :: BIGINT
        FROM generate_series(0, 5) exponent
    )
  SELECT bit_or(chunk :: BIT(300) :: BIT VARYING << (63 * (exponent))) :: BIT VARYING
  FROM chunks INTO result;
  RETURN result;
END;
$$ LANGUAGE plpgsql;
like image 33
Sam Leung Avatar answered Sep 23 '22 16:09

Sam Leung