why is scaling writes to a relational database virtually impossible?

2 Answers

A sharded database is actually quite different to a normal SQL database. In a lot of ways it is more like a custom NoSQL system that just happens to use a database for storage. Unless your dataset consists of a lot of completely disconnected subsets, most queries more complex than get by ID won't work the same as they do on a single node database.

The other reason is that SQL writes tend to be fairly expensive due to the requirement for immediate consistency - the indexes that are required for decent read performance on a large database get updated as part of the write operation, and various constraints are checked. In systems designed for horizontal scalability these additional operations are usually either skipped entirely or performed separately from the write.

answered Oct 06 '22 00:10

Tom Clarkson

The slowness of physical disk subsystems is usually the single greatest challenge to overcome when trying to scale a database to service a very large number of concurrent writers. But it is not "virtually impossible" to optimize writes to a relational database. It can be done. Yet there is a trade-off: when you optimize writes, selects of large subsets of logically related data usually are slower.

The writes of the primary data to disk and the rebalancing of index trees can be disk-intensive. The maintenance of clustered indexes, whereby rows that belong logically together are stored physically contiguous on disk, is also disk-intensive. Such indexes make selects (reads) quicker while slowing writes. A heavily indexed table does not scale well therefore, and the lower the cardinality of the index, the less well it scales.

One optimization aimed at improving the speed of concurrent writers is to use sparse tables with hashed primary keys and minimal indexing. This approach eliminates the need for an index on the primary key value and permits an immediate seek to the disk location where a row lives, 'immediate' in the sense that the intermediary of an index read is not required. The hashed primary key algorithm returns the physical address of the row using the primary key value itself-- a simple computation that requires no disk access.

The sparse table is exactly the opposite of storing logically related data so they are physically contiguous. In a sparse table, writers do not step on each others toes, so to speak. Writes are like raindrops falling on a large field not like a crowd of people on a subway platform trying to step into the train through a few open doors. The sparse table helps to eliminate write bottlenecks.

However, because logically related data are not physically contiguous, but scattered, the act of gathering all rows in a certain zipcode, say, is expensive. This sparse-table hashed-pk optimization is therefore optimal only when the predominant activity is the insertion of records, the update of individual records, and the lookup of data relating to a single entity at a time rather than to a large set of entities, as in, say, an order-entry system. A company that sold merchandise on TV and had to service tens of thousands of simultaneous callers placing orders would be well served by a system that used sparse tables with hashed primary keys. A national security database that relied upon linked lists would also be well served by this approach. Many social networking applications could also use it to advantage.

answered Oct 05 '22 22:10

Tim

Related questions
                            
                                How to find the previous and next record using a single query in MySQL?
                            
                                MySQL Text Column being truncated
                            
                                How to order by maximum of two column which can be null in MySQL?
                            
                                how to connect to database on another server
                            
                                can we insert into two tables with single sql statement?
                            
                                if mysql_query() fails, what to do?
                            
                                MySQL: Unique constraint on multiple fields [duplicate]
                            
                                SQL Join to only the maximum row puzzle
                            
                                How to: database versioning with maven2?
                            
                                How to NOT select empty string
                            
                                mysql: searching BETWEEN dates stored as varchar
                            
                                return count 0 with mysql group by
                            
                                Getting the first row of the mysql resource string?
                            
                                How can I exclude data for certain tables but keep structure with mysqldump?
                            
                                mysql- can I do INSERT IGNORE with multiple values?
                            
                                How to access mysql database using shell script?
                            
                                What is the maximum range of varchar in MySQL?
                            
                                mySQL select differences between two tables in different databases
                            
                                MySQL DATETIME DIFF query
                            
                                Converting latin1_swedish_ci to utf8 with PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why is scaling writes to a relational database virtually impossible?

Tags:

database

sql-server

mysql

nosql

cassandra

totsum

People also ask

2 Answers

Tom Clarkson

Tim

Recent Activity

Donate For Us