In my MySQL DB I have a table defined like:
CREATE TABLE `mytablex_cs` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`tag` varchar(6) COLLATE utf8_bin NOT NULL DEFAULT '',
`value` text COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
KEY `kt` (`tag`),
KEY `kv` (`value`(200))
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
I need to implement a UNIQUE constraint (key) on the value
field.
I know that is not yet possible to define a unique index on the entire value for a blob
or text
field, but there is a ticket(?) open to implement such feature (see this page) where it has been suggested to create a unique key using a hash like it is already implemented for other fields.
Now I would like to use a similar approach adding to the table another field that will contain the hash and creating a unique key on this field.
I gave a look to possible ways to create this hash and, since I would like to avoid collisions (I need to insert several millions of entries), it seems that the RIPEMD-160 algorithm is the best one, even if a quick search gave me several similar solutions that use SHA256 or even SHA1 and MD5.
I totally lack of knowledge in cryptography, so what are the down sides of choosing this approach?
Another question I have is: which algorithm is currently used by MySQL to create the hash?
Lets look at your requirements:
You need to ensure that a value field is unique. The value field is a text column and due to the nature of it there is no way to create a unique index on the value field(for now). So using a extra field which is the hash of the field value is your only real option here.
Advantages to this approach:
Disadvantages of this approach:
I assume you understand what a hash function does and conceptually how it works.
You can find a list of cryptographic functions here: http://dev.mysql.com/doc/refman/5.5/en//encryption-functions.html
MySQL supports as far as I know MD5, SHA, SHA1 and SHA2 hashing functions. Most if not all of these should be sufficient for just hashing. Some functions like MD5 has some issues when used in cryptography applications i.e. when using it in PKI as a signature algorithm etc. However these issues should not be that important when you decide on using it to create a unique value as it is not really being applied in a cryptography context here.
To use the MySQL hashing functions you can try the following examples:
SELECT MD5('1234')
SELECT SHA('1234')
SELECT SHA1('1234')
SELECT SHA2('1234',224);
As with everythig new you should try all the approaches and find the one that will be most successfull in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With