The PostgreSQL types bytea
and bit varying
sound similar:
bytea
stores binary strings.bit varying
stores strings of 1's and 0's.The documentation does not mention a maximum size for either. Is it 1GB like character varying
?
I have two separate use cases, both over a table with millions of rows:
Storing MD5 hashes
That would be a bytea
with a length of 16 bytes or a bit(128)
. It would be used for:
GROUP BY
, with an index I suppose.WHERE md5 =
for exact matches only.Storing arbitrary binary data
Strings of binary data of varying length up to 4kB for:
Working example for the bitwise operation, using bit varying
. The mask is X'00FF00' and the it returns only the row X'AAAAAA'. I shortened the strings for the example but it would be over their full length, up to 4kB. Is it possible to do something similar with bytea
?
CREATE TABLE test1 (mystring bit varying);
INSERT INTO test1 VALUES (X'AAAAAA'), (X'ABCABC');
SELECT * FROM test1 WHERE mystring & X'00FF00' = X'00AA00';
Which of bytea
and bit varying
is the more appropriate?
I saw the UUID
type is made to store exactly 16 bytes, would that be any advantage to store the MD5's?
The bytea data type allows the storage of binary strings or what is typically thought of as “raw bytes”. Materialize supports both the typical formats for input and output: the hex format and the historical PostgreSQL escape format. The hex format is preferred.
The BYTEA data type allows storage of binary strings. It stores a LOB within the table, respectively using TOAST. It is thus limited to 1 GB. The storage is octal and allows non printable characters (in contrast to character strings which don't). The input/output format is HEX (as of PostgreSQL 9.0).
PostgreSQL has a special kind of database object generator called SERIAL. It is used to generate a sequence of integers which are often used as the Primary key of a table. Syntax: variable_name SERIAL.
Integer ( INT ) is a 4-byte integer that has a range from -2,147,483,648 to 2,147,483,647. Serial is the same as integer except that PostgreSQL will automatically generate and populate values into the SERIAL column. This is similar to AUTO_INCREMENT column in MySQL or AUTOINCREMENT column in SQLite.
In general, if you're not using bitwise operations you should be using bytea
.
I store larger values in bytea
and then convert substrings to bit varying
for bitwise operations where possible, mostly because clients understand bytea
much more consistently than bit varying
and the I/O format is more compact.
MD5 values should be stored as bytea
. Bitwise operations on them make no sense, and you generally want to fetch them as binary.
I think bit varying
really has two uses:
For pretty much everything else, use bytea
.
There's nothing stopping you storing a 4k bitfield if that's what it is, though.
bytea
is 1 GB. [1]bit varying
(explanation see below)bytea
. It will take less storage than bit varying
UUID
is UUID
algorithm somehow guarantees your uniqueness, not only in your table, but also in your database or even across your database (even if you generate UUID
in your application). I think if you are using UUID without dashes it will be more efficient for storing, comparing and sorting in UUID
(comparison between bytea
and UUID
see below).For bitwise operation use bit varying
If you concern about storage:
bit varying
takes more storage than bytea
. If you are okay then you should try comparing the function they both offer:
bit varying vs bytea
So far I can see bit varying
will be more suitable for you to do bitwise operation though bytea
is generally accepted way to store arbitrary data.
PostgreSQL offers a single bytea
operator: concatenation. You can append one byte
value to another bytea
value using the concatenation operator ||
. [1]
Note that you cannot compare two bytea
value, even for equality/inequality. You can, of course, convert bytea
value into another value using the CAST()
, and that opens up other operators. [1]
Comparison between UUID
and bytea
create table u(uuid uuid primary key, payload character(300));
create table b( bytea bytea primary key, payload character(300));
INSERT INTO u
SELECT uuid_generate_v4()
FROM generate_series(1,1000*1000);
INSERT INTO b
SELECT random_bytea(16)
FROM generate_series(1,1000*1000);
VACUUM ANALYZE u;
VACUUM ANALYZE b;
## Your table size
SELECT pg_size_pretty(pg_total_relation_size('u'));
pg_size_pretty
----------------
81 MB
SELECT pg_size_pretty(pg_total_relation_size('b'));
pg_size_pretty
----------------
101 MB
## Speed comparison
\timing on
## Common select
select * from u limit 1000;
Time: 1.433 ms
select * from b limit 1000;
Time: 1.396 ms
## Random Select
SELECT * FROM u OFFSET random()*1000 LIMIT 10000;
Time: 42.453 ms
SELECT * FROM b OFFSET random()*1000 LIMIT 10000;
Time: 10.962 ms
Conclusion : I don't think there will be more benefit using UUID
except its uniqueness and smaller size (will be faster to insert)
Note: No Index, there is only one connection
Some source :
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With