Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert PostgreSQL bytea-stored serialized-java-UUID to postgresql-UUID

One of our software-projects uses a PostgreSQL-table with a column 'guid' of type bytea.

This is used with hibernate 3.3.2.GA with PostgreSQL 8.4, which serializes the java UUID type using java object serialization. The result is a value like the following escape format bytea literal:

'\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002‌​J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\3‌​42\004M '

... which we cannot easily use in a query as select or condition to retrieve relevant rows.

Does anyone have a way to read or use the bytea-column in the select- or where-parts of a query (e.g. via psql or pgadmin3), without setting up some hibernate-query?

like image 645
FvHovell Avatar asked Jul 22 '14 09:07

FvHovell


1 Answers

Update: See edit to question, this answer applies to the commonplace 16-byte serializations of uuid; the question was amended to reflect java serialization.


Interesting problem. I landed up writing a simple C extension to do it efficiently, but it's probably more sensible to use the PL/Python version below.

Because uuid is a fixed sized type and bytea is varlena you can't just create cast ... as implicit to binary-coerce them, because the variable length field header would get in the way.

There's no built-in function for bytea input to return a uuid. It'd be a handy thing to have, but I don't think anyone's done it yet.

Simplest way

Update: There's actually a simple way to do this. bytea in hex form is actually a valid uuid literal once the \x is stripped off, because uuid_in accepts plain undecorated hex without - or {}. So just:

regress=> SET bytea_output = 'hex';
SET
regress=> SELECT CAST( substring(CAST (BYTEA '\x0FCC6350118D11E4A5597DE5338EB025' AS text) from 3) AS uuid);
              substring               
--------------------------------------
 0fcc6350-118d-11e4-a559-7de5338eb025
(1 row)

It involves a couple of string copies and a hex encode/decode cycle, but it'll be tons faster than any of the PL answers I suggested earlier, though slower than C.

Other options

Personally I recommend using PL/Perl or pl/pythonu. I'll follow up with an example.

Assuming your uuid is the hex-format bytea literal:

'\x0FCC6350118D11E4A5597DE5338EB025'

you could turn it into a uuid type with:

PL/Perl

create language plperlu;

create or replace function to_uuid(bytea) returns uuid language plperlu immutable as $$
use Data::UUID;
my $ug = new Data::UUID;
my $uuid = $ug->from_hexstring(substr($_[0],2));
return $ug->to_string($uuid);
$$
SET bytea_output = hex;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

PL/Python

It's probably faster and cleaner in Python because the PL/Python interface passes bytea as raw bytes not as hex strings:

CREATE LANGUAGE plpythonu;

CREATE or replace function to_uuid(uuidbytes bytea) 
RETURNS uuid LANGUAGE plpythonu IMMUTABLE 
AS $$
import uuid
return uuid.UUID(bytes=uuidbytes)
$$;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

In C, just for kicks. Ugly hack.

You can see the C extension module here.

But really, I mean it about it being ugly. If you want it done properly in C, it's best to actually patch PostgreSQL rather than use an extension.

like image 180
Craig Ringer Avatar answered Oct 03 '22 09:10

Craig Ringer