Ok, dumb question I know but I see the nebulous comment 'a large database' as well as small and medium and I wonder just what that means. Can someone define what a small, medium and large database is for us SQL neophytes?
Big data databases store petabytes of unstructured, semi-structured and structured data without rigid schemas. They are mostly NoSQL (non-relational) databases built on a horizontal architecture, which enable quick and cost-effective processing of large volumes of big data as well as multiple concurrent queries.
“Big data” is a term relative to the available computing and storage power on the market — so in 1999, one gigabyte (1 GB) was considered big data. Today, it may consist of petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of information, including billions or even trillions of records from millions of people.
What are examples of big data? Big data comes from myriad sources -- some examples are transaction processing systems, customer databases, documents, emails, medical records, internet clickstream logs, mobile apps and social networks.
There isn't a threshold where a small database becomes medium or a medium database becomes large. Generally, when I hear these terms, I think of particular orders of magnitude in terms of total records being stored.
As poster dkretz suggested, you could also think about it in terms of the properties each kind of database has. Categorizing it this way, I'd say:
Small: Performance is not a concern. Your queries run fine without making any special optimizations. You see only a marginal performance difference when using front-line enhancements like indexes.
Medium: Your database probably has one or more staff that are assigned part-time to its maintenance and care. These people pay attention to the database's health; their primary administrative responsibility is to prevent unacceptable performance problems and minimize downtime.
Large: Probably has dedicated staff member(s) whose job is to work on the database and improve performance, as well as make sure that application changes don't cause schema breakage over the lifetime of the database. Metrics about the health and status of the database are monitored closely. Significant expertise is required to understand and perform optimizations.
Very large: The database stores vast amounts of information that must be readily accessible. Performance optimizations are absolutely required to wring every last ounce of speed out of each queries, and without it, the database would be much less usable or even impossible to use. The database may be using sophisticated or innovative replication or clustering techniques, pushing the boundaries of current technology.
Note that these are entirely subjective, and that someone may very well have a perfectly legitimate alternate definition of "large".
One way to figure it is by observing your test queries.
A small database is one where indexes don't matter.
A medium database is one where queries take longer than one second if you don't have an appropriate index in place.
A big database is one where queries often take hours to optimize, using a combination of query design, index modification, and many test cycles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With