Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use composite types and arrays and when to normalize a database?

Is there any guideline on when to normalize a database or just use composite types and arrays?

When using arrays and composite types, I can use just a single table. I can also normalize the database and use a couple of tables and joins.

How do you decide which option is best?

like image 994
rve Avatar asked Feb 07 '14 14:02

rve


People also ask

When should you normalize a database?

It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table. It also prevents any issues stemming from database modifications such as insertions, deletions, and updates. The stages of organization are called normal forms.

What are the four 4 types of database normalization?

First Normal Form (1 NF) Second Normal Form (2 NF) Third Normal Form (3 NF) Boyce Codd Normal Form or Fourth Normal Form ( BCNF or 4 NF)

When might you not fully normalize a database?

In addition to performance, one more reason for not fully normalizing might be if you have a certain "fuzziness" in your data. As far as I understand1, ZIP may be specific to a city block or area, which means an especially long street could have more than one ZIP.

Why do we use normalization in DBMS?

The main use of normalization is to utilize in order to remove anomalies that are caused because of the transitive dependency. Normalization is to minimize the redundancy and remove Insert, Update and Delete Anomaly. It divides larger tables into smaller tables and links them using relationships.

What are composite data types?

Composite data types are a combination of primitives and other data types. They include arrays, lists, and collections. Composite data types include arrays, lists, and collections What is a Composite Data Type? What is an Array? An array is an ordered arrangement of data. How can we relate Post Office Boxes to arrays?

What are the elementary concepts used in database normalization?

The elementary concepts used in database normalization are: Keys. Column attributes that identify a database record uniquely. Functional Dependencies. Constraints between two attributes in a relation.

What are the anomalies when making data changes in a table?

The table contains data redundancy, which in turn causes three anomalies when making data changes: 1. Insert anomaly. When trying to insert a new employee in the finance sector, you must also know the manager's name. Otherwise, you cannot insert data into the table.

What is composite pattern in Java?

Composite Pattern describes one such way of building a class hierarchy made up of classes for two different kinds of objects (composite and primitive). One goal of software engineering is to maintain high cohesion between modules and the other to reduce coupling.


1 Answers

Most of the time, stick to normalization. Among other things, keeping your database fairly well normalized helps with lock granularity. For example, if you have a "parent" object with two arrays in it, you cannot have transactions that are simultaneously adding/updating/modifying members of the arrays. If they're regular side tables, you can. (You can still SELECT ... FOR UPDATE the parent row before updating child objects if you want the serialized behaviour, though).

Updating an array to add/replace/delete a value is expensive, as PostgreSQL must rewrite the whole tuple the array is in as an MVCC update. (It has a few TOAST tricks up its sleeve that can help, but not tons). Ditto composite types embedded in rows.

Big wide rows full of arrays and composites mean slower table scans, meaning slower fetches for commonly used values.

IIRC you can't define a foreign key into a field of a composite type, so you'll find yourself working around that or giving up on referential integrity where it'd be good to have. Ditto arrays (there was work to get foreign keys to arrays to work but I don't think it ever got comitted).

Many client drivers (PgJDBC, psqlODBC, psycopg2, etc etc etc) have incomplete to nonexistent support for arrays and composites, so you'll often land up expanding them into tuples for client driver interaction anyway. Some things, like arrays of composite types, are really quite painful to work with.

Most ORMs, including common ones like Hibernate, totally suck at using anything beyond the most utterly simplistic lowest-common-denominator SQL features. Sooner or later, someone's going to want to point one of those at your data model, at which point much wailing and gnashing of teeth will ensue. OTOH, don't accomodate garbage ORMs to the point where you avoid using features that'll greatly improve the data model and solve real world problems - for example, if you have the choice of storing native hstore fields, or using an EAV schema, consider just using jstore (or better, in 9.4, json with hstore features).

(Perversely, this means that people who have the most "object oriented" programs often have the most purely relational databases because their tools suck).

Things like report generation tools will similarly struggle with composites and arrays, so you'll often land up creating views to present a normalized appearance for the DB anyway. Then ON INSERT OR UPDATE OR DELETE ... DO INSTEAD triggers on the views to enable writes. At which point it gets ugly.

Personally I recommend keeping composites for times when it's logical to model something as a "type". Consider, say, if your data model required you to track timestamps in their original time zone. There's no built-in type for this (no, that's not what "timestamp with time zone" does, despite the name, thanks SQL committee), so you might create a composite type that stored (timestamp without time zone, tzname) and use that consistently in your data model.

Similarly, I tend to use arrays in queries a lot, but not in the data model much. They're useful when you want to intentionally denormalize something for performance, but that's often done in a materialized view or similar. Even if it's a change to the main data model, it's the sort of thing you should be doing based on proper performance review, not just "optimizing" stuff you don't know is slow yet.

like image 80
Craig Ringer Avatar answered Nov 11 '22 19:11

Craig Ringer