What's the optimal way to store binary flags / boolean values in each database engine?

Tags:

I've seen some possible approaches (in some database engines some of them are synonyms):

TINYINT(1)
BOOL
BIT(1)
ENUM(0,1)
CHAR(0) NULL

All major database engine supported by PHP should be noted, but just as a refference it'll be even better if also other engines will be noted.

I'm asking for a design that is best optimized for reading. e.g. SELECTing with the flag field in the WHERE condition, or GROUP BY the flag. Performance is much more important than storage space (except when the size has an impact on performance).

And some more details:

While creating the table I can't know if it'll be sparse (if most flags are on or off), but I can ALTER the tables later on, so if there is something I can optimize if I know that, it should be noted.

Also if it's make a difference if there is only one flag (or a few) per row, versus many (or a lot of) flags it should be noted.

BTW, I've read somewhere in SO the following:

Using boolean may do the same thing as using tinyint, however it has the advantage of semantically conveying what your intention is, and that's worth something.

Well, in my case it doesn't worth nothing, because each table is represented by a class in my application and everything is explicitly defined in the class and well documented.

735

asked Dec 26 '10 22:12

xun

1 Answers

This answer is for ISO/IEC/ANSI Standard SQL, and includes the better freeware pretend-SQLs.

First problem is you have identified two Categories, not one, so they cannot be reasonably compared.

A. Category One

(1) (4) and (5) contain multiple possible values and are one category. All can be easily and effectively used in the WHERE clause. They have the same storage so neither storage nor read performance is an issue. Therefore the remaining choice is simply based on the actual Datatype for the purpose of the column.

ENUM is non-standard; the better or standard method is to use a lookup table; then the values are visible in a table, not hidden, and can be enumerated by any report tool. The read performance of ENUM will suffer a small hit due to the internal processing.

B. Category Two

(2) and (3) are Two-Valued elements: True/False; Male/Female; Dead/Alive. That category is different to Category One. Its treatment both in your data model, and in each platform, is different. BOOLEAN is just a synonym for BIT, they are the same thing. Legally (SQL-wise) there are handled the same by all SQL-compliant platforms, and there is no problem using it in the WHERE clause.

The difference in performance depends on the platform. Sybase and DB2 pack up to 8 BITs into one byte (not that storage matters here), and map the power-of-two on the fly, so performance is really good. Oracle does different things in each version, and I have seen modellers use CHAR(1) instead of BIT, to overcome performance problems. MS was fine up to 2005 but they have broken it with 2008, as in the results are unpredictable; so the short answer may be to implement it as CHAR(1).

Of course, the assumption is that you do not do silly things such as pack 8 separate columns in to one TINYINT. Not only is that a serious Normalisation error, it is a nightmare for coders. Keep each column discrete and of the correct Datatype.

C. Multiple Indicator & Nullable Columns

This has nothing to do with, and is independent of, (A) and (B). What the columns correct Datatype is, is separate to how many you have and whether it is Nullable. Nullable means (usually) the column is optional. Essentially you have not completed the modelling or Normalisation exercise. The Functional Dependencies are ambiguous. if you complete the Normalisation exercise, there will be no Nullable columns, no optional columns; either they clearly exist for a particular relation, or they do not exist. That means using the ordinary Relational structure of Supertype-Subtypes.

Sure, that means more tables, but no Nulls. Enterpise DBMS have no problem with more tables or more joins, that is what they are optimised for. Normalised databases perform much better than unnormalised or denormalised ones, and they can be extended without "re-factoring'. You can ease the use by supplying a View for each Subtype.

If you want more information on this subject, look at this question/answer. If you need help with the modelling, please ask a new question. At your level of questioning, I would advise that you stick with 5NF.

D. Performance of Nulls

Separately, if performance is important to you, then exclude Nulls. Each Nullable column is stored as variable length; that requires additional processing for each row/column. The enterprise databases use a "deferred" handling for such rows, to allow the logging, etc to move thought the queues without impeding the fixed rows. In particular never use variable length columns (that includes Nullable columns) in an Index: that requires unpacking on every access.

E. Poll

Finally, I do not see the point in this question being a poll. It is fair enough that you will get technical answers, and even opinions, but polls are for popularity contests, and the technical ability of responders at SO covers a very range, so the most popular answers and the most technically correct answers are at two different ends of the spectrum.

answered Sep 22 '22 14:09

PerformanceDBA

Related questions
                            
                                Normalizing this database: what would be ideal in this scenario?
                            
                                how to represent trees and their content in MySQL?
                            
                                Address book database design: denormalize?
                            
                                Does it make sense to use neo4j to index a file system
                            
                                PHP & MySQL - Best way to handle different database language contents
                            
                                database design - when to split tables?
                            
                                How to create multiple sequences in one table?
                            
                                Incorporate additional requirements into a legacy database design
                            
                                How to deal with mutually dependent inserts
                            
                                Rails 4 How to model a form with a collection of checkboxes with other text_field
                            
                                Database design: RBAC or ABAC?
                            
                                NestJs Design Problem: How can I avoid creating a Nodejs Instance for each team?
                            
                                Database schema for hierarchical groups
                            
                                what's the best implemention of client creatable and modifiable web forms in a relational database?
                            
                                database design with many type of users
                            
                                How to plan for schema changes in an SQLite database?
                            
                                A format for storing personal contacts in a database
                            
                                Database: when to split into separate tables?
                            
                                SQL Server Calculated Column
                            
                                Too many columns to index - use mySQL Partitions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the optimal way to store binary flags / boolean values in each database engine?

Tags:

flags

database-design

bitflags

xun

People also ask

1 Answers

PerformanceDBA

Recent Activity

Donate For Us