Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can bitwise math be used for one-to-many relationships in SQL?

Proper normalization in an RDBMS means a proliferation of tables. Integer fields can store orthogonal data as bits – can this be used as a substitute for an additional table, without sacrificing relational integrity?

like image 795
Matt Sherman Avatar asked Aug 02 '12 17:08

Matt Sherman


3 Answers

For a one-to-many relationship, where the “many” has a small number of known values, relationships can be stored as bitmasks in the parent table as an integer, replacing the need for an additional table.

Say we have a table Person and we’d like to know how many Continents a person has visited. We’d start by assigning each Continent an “orthogonal” bit value. In C#, an enum is a good choice for this:

[Flags]
public enum JobAdvertisingRegion
{
    NorthAmerica = 1,              // or 1 << 0
    SouthAmerica = 2,              // 1 << 1
    Europe = 4,                    // 1 << 2
    Asia = 8,                      // 1 << 3
    Africa = = 16,                 // 1 << 4
    Australia = 32,                // 1 << 5
    Anarctica = 64                 // 1 << 6
}

The Persons table could then simply have a int column called Contintents. To indicate that a Person has visited Europe and Asia:

UPDATE Persons SET Continents = (4 + 8) WHERE Id = whatever

To search for Persons who have visited Antarctica, we use bitwise math:

SELECT * FROM Persons WHERE Continents & 64 = 64

To search for Persons who have visited both Africa and Asia:

SELECT * FROM Persons WHERE Continents & (16 + 8) = (16 + 8)

To search for Persons who have visited either Australia or South America:

SELECT * FROM Persons WHERE Continents & (32 + 2) != 0

One downside is that, while integer columns are indexable in SQL, their bit components are not. Some optimizations to get around this, for the above queries:

SELECT * FROM Persons WHERE Continents & 64 = 64 AND Continents >= 64

SELECT * FROM Persons WHERE Continents & (16 + 8) = (16 + 8) AND Continents >= (16 + 8)

SELECT * FROM Persons WHERE Continents & (32 + 2) != 0 AND Continents >= 2
like image 176
Matt Sherman Avatar answered Nov 15 '22 17:11

Matt Sherman


The answer to your question is "no". The bit fields sacrifice relational integrity, for the simple reason that you have entities in the database that don't have a corresponding tables.

That said, many databases offer support for this, generally through a "bit" data type. Mysql has even stronger support, with the "set" data type.

The primary issue is that you do not know anything about the elements in the set -- what the full name is, when it was added into the database, and so on. (Enums get around part of the naming problem.) In addition, the size of the set is limited. You may have an example where things are limited. However, Matt's example rather emphasizes the problem here. You can have a list of continents visited. However, when you switch to countries visited, the approach is necessarily quite different, because the number of countries no longer fits in a single "word". Would you want your system to treat continents very differently from countries in this respect? Do you want your design decisions to be restricted by the limit of 32 or 64 bits in a computer word?

And finally, you seem to view the proliferation of tables as a problem. The proliferation of tables is actually a solution. All data about entities is stored in tables, rather than being spread out through the system. You can maintain information about instances of the entity, such as when the instance was created, how it might have changed over time, and so on. An entity for "continents" is likely to be used whenever someone wants a continent.

Consider what happens in a system where two different developers decide to develop their own bit masks for continents -- but they put the continents in a different order. With a well designed relational database (meaning that foreign key relationships are explicitly declared in the table definition), such confusion could not arise.

like image 28
Gordon Linoff Avatar answered Nov 15 '22 17:11

Gordon Linoff


Well, I'll go against the (currently) popular opinion here by simply stating few facts

  • SQL and relational model are not the same thing.
  • Relational model (theory) works with relational variables, so called relvars.
  • SQL databases use tables.
  • Tables and relvars are not the same, but can be -- for all the practical purposes.
  • To use relational modelling theory, tables should represent relvars.
  • For a table to represent a relational variable the following should be true:

    1. Rows have no order
    2. Columns have no order
    3. No duplicate rows
    4. Every column-row intersection has only one value of the column-defined type
    5. There are no special hidden columns (row-ids, object ids ...)

So, you can do many things with tables and SQL that is out of the scope of the relational design theory, but you do lose benefits of the "relational" ...

Technically your post (question) has two answers.

  • Yes, to the title of the post.
  • No, to the body of the post.
like image 1
Damir Sudarevic Avatar answered Nov 15 '22 17:11

Damir Sudarevic