Database design - empty fields [closed]

Tags:

I am currently debating an issue with my dev team. They believes that empty fields are bad news. For instance, if we have a customer details table that stores data for customers from different countries, and each country has a slightly different address configuration - plus 1-2 extra fields, e.g. French customer details may also store details for entry code, and floor/level plus title fields (madamme, etc.). South Africa would have a security number. And so on.

Given that we're talking about minor variances my idea is to put all of the fields into the table and use what is needed on each form.

My colleague believes we should have a separate table with extra data. E.g. customer_info_fr. But this seams to totally defeat the purpose of a combined table in the first place.

The argument is that empty fields / columns is bad - but I'm struggling to find justification in terms of database design principles for or against this argument and preferred solutions.

Another option is a separate mini EAV table that stores extra data with parent_id, key, val fields. Or to serialise extra data into an extra_data column in the main customer_data table.

I think I am confused because what I'm discussing is not covered by 3NF which is what I would typically use as a reference for how to structure data.

So my question specifically: -

If you have slight variances in data for each record (1-2 different fields for instance) what is the best way to proceed?

721

asked May 01 '10 20:05

user307927

2 Answers

There is definitely a school of thought which holds that NULL fields are bad, in and of themselves. Relational theory demands that databases consist of facts, and NULLs are the absence of fact. So, a rigorously designed database would have no nullable columns.

Your colleague is proposing something which is on the road to 6th Normal Form, where all the tables consist of a primary key and at most one other column. Only in such a schema we wouldn't have tables called customer_info_fr. That's not normalised. Many countries might include ENTRY_CODE in their addresses. So we would need address_entry_codes and address_floor_numbers. Not to mention address_building_number and address_building_name, as some places are identified by number and other by name.

It's completely accurate and truthful as a logical design. Alas from a physical perspective it is Teh Suck! The simplest query - select * from addresses - becomes a multi-table join, and outer joins at that. Nullable columns are a way of reconciling ugly design with the hard truth, "you cannae break the laws of physics". Nullable columns allow us to combine disjoint data sets into a single table, albeit at the cost of handling nulls (they can affect data retrieval, index usage, maths, etc).

Some designs attempt to get around the use of nulls by applying magic values. That is, if we don't know the correct value for some column we inject a default value which is a value but also means "unknown". A common instance of this is date '9999-12-31' as an open-ended TO_DATE in a FROM-TO date range. As long as everybody understands and adheres to the convention it's not a problem. It becomes a problem when some tables have date '9999-12-01' or date '9999-01-31' instead.

This is why magic values are not a robust solution. Consumers of our data need to know that -1 is the value we use for DofQ in our stock control system when we don't know the real value. But at least it's obviously not a valid value. Choosing say 20 as a magic value is deadly because it could be a real DofQ: we can no longer tell the actual values from the "don't knows".

So, given a choice between nulls and magic values, choose nulls.

116

answered Nov 01 '22 10:11

APC

I'd be interested in your colleague's justification as to why empty fields are bad. As far as I'm aware, empty or null fields aren't bad in and of themselves. If you have a lot of empty data values for a column that you are planning on putting an important index on, you may want to consider other options. This goes for any column where you have a lot of duplicate records actually and need an index, as duplicated records lower the cardinality of the column, making indexes less useful. In your case, I don't see it being an issue.

For this kind of data, you're likely using a VARCHAR or some kind of TEXT column anyway, which are variable length fields in the database. It doesn't matter if your field is full of data or empty, you're still going to incur the overhead of a variable-length column (which isn't worth worrying about in normal circumstances). So again, there's no difference to the RDBMS.

From the sounds of what you're designing, I think if you came up with a generic method of handling address variances in a single table, it would be the way to go. Your code and structure would be much simpler at the negligible (in my opinion) cost of some empty data fields.

answered Nov 01 '22 10:11

zombat

Related questions
                            
                                Redis vs MySQL for Financial Data?
                            
                                Start Transaction OR Begin Work
                            
                                USING keyword in Mysql
                            
                                ForeignKey Referencing Same Table
                            
                                BLOB/TEXT column 'bestilling' used in key specification without a key length
                            
                                SQL update multiple rows with same value
                            
                                Multiple Table Joins with WHERE clause
                            
                                How can I add an extra configuration file for mysql (my.cnf)
                            
                                MySQL count in same table including zero count values
                            
                                In SQL how do you update each row of the table by finding all rows that are equal for a column, then set another column equal to eachother
                            
                                MySQL selecting text after last slash
                            
                                How to put JSON into a column data if sub-query returns more than 1 row in MySQL
                            
                                Change field in MySQL from TEXT to LONGTEXT
                            
                                Exception in connection with mysql through jdbc
                            
                                Xampp MySQL not starting - “MYSQL not starting on XAMPP 3.2.1 version…”
                            
                                What is equivalent of the Nz Function in MS Access in MySQL? Is Nz a SQL standard?
                            
                                Secure MySQL backup cron job – my.cnf is not being read
                            
                                InnoDB "The Table is Full" error
                            
                                MYSQL SUM GROUP BY
                            
                                Is a varchar 2 more efficient than a varchar 255?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Database design - empty fields [closed]

Tags:

mysql

database-design

user307927

People also ask

2 Answers

APC

zombat

Recent Activity

Donate For Us