I have imported tables from Postgres to hdfs by using sqoop. My table have uuid field as primary key and my command sqoop as below:
sqoop import --connect 'jdbc:postgresql://localhost:5432/mydb' --username postgreuser --password 123456abcA --driver org.postgresql.Driver --table users --map-column-java id=String --target-dir /hdfs/postgre/users --as-avrodatafile --compress -m 2
But I got the error:
Import failed: java.io.IOException: org.postgresql.util.PSQLException: ERROR: function min(uuid) does not exist
I tried executed the sql command: SELECT min(id) from users
and got the same error. How could I fix it ? I use Postgres 9.4, hadoop 2.9.0 and sqoop 1.4.7
UUID is an abbreviation for Universal Unique Identifier defined by RFC 4122 and has a size of 128-bit. It is created using internal algorithms that always generate a unique value. PostgreSQL has its own UUID data type and provides modules to generate them.
Unfortunately, while PostgreSQL is great for storing and comparing UUID data, it lacks capabilities for creating UUID values in its core. Instead, it relies on third-party modules to create UUIDs using specified techniques.
The GUID/UUID is not a text in either PostgreSQL or SQL Server system. They are native typs in both systems and are stored using 16 bytes. PostgreSQL does although have auto casts built in for converting uuids to text where as SQL Server requires an explicit cast to varchar.
(Some systems refer to this data type as a globally unique identifier, or GUID, instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm.
I'd like to credit @robin-salih 's answer, I've used it and implementation of min for int, to build following code:
CREATE OR REPLACE FUNCTION min(uuid, uuid)
RETURNS uuid AS $$
BEGIN
IF $2 IS NULL OR $1 > $2 THEN
RETURN $2;
END IF;
RETURN $1;
END;
$$ LANGUAGE plpgsql;
create aggregate min(uuid) (
sfunc = min,
stype = uuid,
combinefunc = min,
parallel = safe,
sortop = operator (<)
);
It almost the same, but takes advantages of B-tree index, so select min(id) from tbl
works in few millis.
P.S. I'm not pgsql expert, perhaps my code is somehow wrong, double check before use in production, but I hope it uses indexes and parallel execution correctly. I've made it just from sample code, not digging into theory behind aggregates in PG.
Postgres doesn't have built-in function for min/max uuid, but you can create your own using the following code:
CREATE OR REPLACE FUNCTION min(uuid, uuid)
RETURNS uuid AS $$
BEGIN
IF $2 IS NULL OR $1 > $2 THEN
RETURN $2;
END IF;
RETURN $1;
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE min(uuid)
(
sfunc = min,
stype = uuid
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With