Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is the problem with hierarchal and network models of databases?

Before E F Codd published his paper "A Relational Model of Data for Large Shared Data Banks" in 1970, hierarchal and network were the two prominent models of the database.

What exactly was wrong with them that they did not prevail?

like image 930
Moeb Avatar asked Dec 13 '22 20:12

Moeb


1 Answers

The answers so far cover a lot of the practical reasons why the network and hierarchical models were eventually displaced by the relational model (including SQL database systems). Codd's 1970 paper explains why a new model is needed, in detail. It's a great read. Indeed, before Codd, the term "data model" was practically unheard of. So he coined the terms "hierachical model" and "network model" in order to describe database systems that had been constructed with no precise model in mind.

The hierarchical and network models can be collected into a general term, called the "graph model". The essential feature of the graph model of data is that data items are referenced by stating their location. If you understand pointers, you understand everything fundamental about the graph model.

There are two very powerful advantages to the graph model of data. The first is that it's very easy for programmers to grasp. Novice programmers go through a certain learning curve coming to grips with pointers, but once they've done that, they are ready to understand graph data easily.

The second advantage is that pointers are extremely fast, provided that the navigation path to be followed was anticipated at the time the data was written.

There are several disadvantages to using pointers to identify data. One is that the data becomes "pinned". That is, when the data is to be shuffled all of the pointers that reference the data have to be located and updated. Or a "forwarding address" has to be left at the old location. If you've ever been in the web and clicked on a button that has always worked, only to be greeted with the infamous "page not found" error, you've probably come across the pitfall of shuffling pinned data without updating references to it.

A second one is that navigating data along unplanned access paths can be downright disastrous, both in terms of performance, and in terms of logical correctness. This is one of the reasons why ad hoc reporting is so difficult with graph databases.

A third drawback of graph data is that there may be logical relationships in the graph data that are not inherent in the data as given. The fundamental advantage of the relational model is that all the relationships are inherent in the data itself. The reason why this is an advantage is complex. I refer you again to the 1970 paper.

In all the "relational DBMSes" that you and I are likely to use, there is a bridge between using data to identify data and using pointers to locate data. It's called an index. The index relates two items: an index key (one or more columns from a table), and a pointer (that locates a row containing the index key). I'm glossing over all the details about indexes.

Anyway an index allows the SQL engine to translate a query that states what data is being sought into where to look for that data. Data that is pointed to by indexes can still be shuffled, but the index has to be rebuilt as part of the process.

This is an overview.

like image 173
Walter Mitty Avatar answered Mar 02 '23 01:03

Walter Mitty