The data structure used for indexing in a DB table is B-Tree (default, out of B-Tree, R-Tree, Hash). Since look-ups, deletions, and insertions can all be done in logarithmic time in a B-Tree, then why is only reading from an indexed table is faster and but writing is slower?
Indexes are only used for speeding up SELECT statements. For INSERT, UPDATE, and DELETE your statements will be slower than normal due to the index needing to be updated as part of the statement.
I should maybe clarify on the UPDATE/DELETE point. It is true that the statements will be slowed down due to the change to the index added to the overhead, however the initial lookup part (WHERE) of the UPDATE and DELETE statement could be sped up due to the index. Basically any place a WHERE clause is used and you reference the indexed fields, the record selection part of that statement should see some increase.
Also, if an UPDATE statement does not alter any of the columns that are part of an index then you should not see any additional slowness as the index is not being updated.
Because indexes require additional disk space. Indexes increase the amount of data that needs to be logged and written to a database. Indexes reduce write performance. When a column covered by an index is updated, that index also must be updated. Similarly any deletes or insert requires updating the relevant indexes.
The disk space and write penalties of indexes is precisely why you need to be careful about creating indices.
That said, updates to non-indexed columns can have their performance improved with indexes.
This:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
Will be faster than this:
UPDATE Table SET IndexedColumn = 'Value' WHERE IndexedKey = 'KeyValue'
But the above two will likely both be faster than this in any reasonably sized table:
UPDATE Table SET NonIndexedColumn = 'Value' WHERE NonIndexedKey = 'KeyValue'
Deletes, especially single deletes, can similarly be faster even though the table and the indexes need to be updated. This is simply because the query engine can find the target row(s) faster. That is, it can be faster to read an index, find the row, and remove the row and update the index, instead of scanning the entire table for the correct rows and removing the relevant ones. However, even in this case there is going to be more data to write; it's just that the IO cost of scanning an entire table could be fairly high compared to an index.
Finally, in theory, a clustering key that spreads inserts across multiple disk pages can allow the system to support more concurrent inserts since inserts typically require page locks to function, but that is a somewhat uncommon situation and it may result in worse read performance due to fragmenting your clustered indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With