What would be the performance penalty of using strings as primary keys instead of bigints etc.? String comparison is much more expensive than integer comparison, but on the other hand I can imagine that internally a DBMS will compute hash keys to reduce the penalty.
An application that I work on uses strings as primary keys in several tables (MySQL). It is not trivial to change this, and I'd like to know what can be gained performance wise to justify the work.
on the other hand I can imagine that internally a DBMS will compute hash keys to reduce the penalty.
The DB needs to maintain a B-Tree (or a similar structure) with the key in a way to have them ordered.
If the key is hashed and stored it in the B-Tree that would be fine to check rapidly the uniqueness of the key -- the key can still be looked up efficiently. But you would not be able to search efficient for range of data (e.g. with LIKE
) because the B-Tree is no more ordered according to the String value.
So I think most DB really store the String in the B-Tree, which can (1) take more space than numeric values and (2) require the B-Tree to be re-balanced if keys are inserted in arbitrary order (no notion of increasing value as with numeric pk).
The penalty in practice can range from insignificant to huge. It all depends on the usage, the number of rows, the average size of the string key, the queries which join table, etc.
In our product we use varchar(32) for primary keys (GUIDs) and we haven't met performance issues of this. Our product is a web site with extreme overload and is critical to be stable. We use SQL Server 2005.
Edit: In our biggest tables we have more than 3 000 000 records with lots of inserts and selects from them. I think in general, the benefit of migrating to int key will be very low, but the problems while migrating very high.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With