I am writing a simple tool to check duplicate files(i.e. files having same data). The mechanism is to generate hashes for each file using sha-512 algorithm and then store these hashes in MYSQL database. I store hashes in binary(64) unique not null column. Each row will have a unique binary hash and used to check file is duplicate or not.
-- My questions are --
Can I use indexes on binary column, my default table collation is latin1 - default collation?
Which Indexing mechanism should I use Btree or Hash, for getting high performance? I need to update or add 100 of rows per seconds.
What other things should I take care of to get best performance?
Can I use indexes on binary column, my default table collation is latin1 - default collation?
Yes, you can; collation is only relevant for character datatypes, not binary datatypes (it defines how characters should be ordered)—also, be aware that latin1
is a character encoding, not a collation.
Which Indexing mechanism should I use Btree or Hash, for getting high performance? I need to update or add 100 of rows per seconds.
Note that hash indexes are only available with the MEMORY
and NDB
storage engines, so you may not even have a choice.
In any event, either would typically be able to meet your performance criteria—although for this particular application I see no benefit from using B-Tree (which is ordered), whereas Hash would give better performance. Therefore, if you have the choice, you may as well use Hash.
See Comparison of B-Tree and Hash Indexes for more information.
What other things should I take care of to get best performance?
Depends on your definition of "best performance" and your environment. In general, remember Knuth's maxim "premature optimisation is the root of all evil": that is, only optimise when you know that there will be a problem with the simplest approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With