We need to be able to compute table hashes for an external environment and compare it to pre-computed hash from an internal environment. The use of this is to ensure that data in the external environment is not tampered by a "rogue" database administrator. Users insist this feature.
Currently, we do this by computing the individual hashes of each column value, perform bit-xor on the column hashes to get the row hash, then perform bit-xor on all the row hashes to come up with the table hash. Pseudo-script below:
cursor hash_cur is
select /*+ PARALLEL(4)*/ dbms_crypto.mac(column1_in_raw_type, HMAC_SH512, string_to_raw('COLUMN1_NAME')) as COLUMN1_NAME
...
from TABLE_NAME;
open hash_cur;
fetch hash_cur bulk collect into hashes;
close hash_cur;
for i in 1..hashes.count
loop
rec := hashes(i);
record_xor = rec.COLUMN1;
record_xor = bit_xor(record_xor, rec.COLUMN2);
...
record_xor = bit_xor(record_xor, rec.COLUMNN);
table_xor = bit_xor(table_xor, record_xor);
end loop;
The pseudo-script above will be run in parallel by using dbms_job.
Problem with this is that we have terabytes of data for certain tables and currently the performance does not meet the performance we want to achieve. Hashing must be done "on-the-fly" as users want to perform hash checking themselves.
It seems to me that the operation is more CPU-bound than I/O bound. I am thinking of storing the table data in a blob instead, where data is properly arranged by record, then by column. Then perform hash on the output file. This should make the operation completely I/O bound.
Any suggestions that can lead me to a better performing script would be greatly appreciated. Thanks.
In a lot of data management systems, a hash is used as a kind of data fingerprint, with the intent not to cryptographically defeat an opponent, but to quickly identify duplicate data by seeing if other data with the given hash already exists in the system.
Hashing is simply passing some data through a formula that produces a result, called a hash. That hash is usually a string of characters and the hashes generated by a formula are always the same length, regardless of how much data you feed into it. For example, the MD5 formula always produces 32 character-long hashes.
The cryptographic hashing of a value cannot be inverted to find the original value. Given a value, it is infeasible to find another value with the same cryptographic hash.
Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string. The most popular use for hashing is the implementation of hash tables.
First of all, I think the way to approach "rogue administrators" is with a combination of Oracle's audit trail and Database Vault features.
That said, here's what I might try:
1) Create a custom ODCI aggregate function to compute a hash of multiple rows as an aggregate.
2) Create a VIRTUAL NOT NULL
column on the table that was an SHA hash of all the columns in the table -- or all the one's you care about protecting. You'd keep this around all the time -- basically trading away some insert/update/delete
performance in exchange to be able to compute hashes more quickly.
3) Create a non-unique index on that virtual column
4) SELECT my_aggregate_hash_function(virtual_hash_column) FROM my_table
to get the results.
Here's code:
CREATE OR REPLACE TYPE matt_hash_aggregate_impl AS OBJECT
(
hash_value RAW(32000),
CONSTRUCTOR FUNCTION matt_hash_aggregate_impl(SELF IN OUT NOCOPY matt_hash_aggregate_impl ) RETURN SELF AS RESULT,
-- Called to initialize a new aggregation context
-- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead
-- of creating the new aggregation context from scratch
STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_hash_aggregate_impl) RETURN NUMBER,
-- Called when a new data point is added to an aggregation context
MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_hash_aggregate_impl, value IN raw ) RETURN NUMBER,
-- Called to return the computed aggragate from an aggregation context
MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_hash_aggregate_impl, returnValue OUT raw, flags IN NUMBER) RETURN NUMBER,
-- Called to merge to two aggregation contexts into one (e.g., merging results of parallel slaves)
MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_hash_aggregate_impl, ctx2 IN matt_hash_aggregate_impl) RETURN NUMBER,
-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_hash_aggregate_impl, value raw) RETURN NUMBER
);
/
CREATE OR REPLACE TYPE BODY matt_hash_aggregate_impl IS
CONSTRUCTOR FUNCTION matt_hash_aggregate_impl(SELF IN OUT NOCOPY matt_hash_aggregate_impl ) RETURN SELF AS RESULT IS
BEGIN
SELF.hash_value := null;
RETURN;
END;
STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_hash_aggregate_impl) RETURN NUMBER IS
BEGIN
sctx := matt_hash_aggregate_impl ();
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_hash_aggregate_impl, value IN raw ) RETURN NUMBER IS
BEGIN
IF self.hash_value IS NULL THEN
self.hash_value := dbms_crypto.hash(value, dbms_crypto.hash_sh1);
ELSE
self.hash_value := dbms_crypto.hash(self.hash_value || value, dbms_crypto.hash_sh1);
END IF;
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_hash_aggregate_impl, returnValue OUT raw, flags IN NUMBER) RETURN NUMBER IS
BEGIN
returnValue := dbms_crypto.hash(self.hash_value,dbms_crypto.hash_sh1);
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_hash_aggregate_impl, ctx2 IN matt_hash_aggregate_impl) RETURN NUMBER IS
BEGIN
self.hash_value := dbms_crypto.hash(self.hash_value || ctx2.hash_value, dbms_crypto.hash_sh1);
RETURN ODCIConst.Success;
END;
-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_hash_aggregate_impl, value raw) RETURN NUMBER IS
BEGIN
raise_application_error(-20001, 'Invalid operation -- hash aggregate function does not support windowing!');
END;
END;
/
CREATE OR REPLACE FUNCTION matt_hash_aggregate ( input raw) RETURN raw
PARALLEL_ENABLE AGGREGATE USING matt_hash_aggregate_impl;
/
create table mattmsi as select * from mtl_system_items where rownum <= 200000;
NOT NULL
alter table mattmsi add compliance_hash generated always as ( dbms_crypto.hash(to_clob(inventory_item_id || segment1 || last_update_date || created_by || description), 3 /*dbms_crypto.hash_sh1*/) ) VIRTUAL not null ;
create index msi_compliance_hash_n1 on mattmsi (compliance_hash);
SELECT matt_hash_aggregate(compliance_hash) from (select compliance_hash from mattmsi order by compliance_hash);
A few comments:
SUM()
over the row-level hashes,
because an attacker could forge the correct sum very easily.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With