Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing sex (gender) in database

People also ask

How is gender stored in a database?

Initially, you can load it up with: INSERT INTO static_gender VALUES (DEFAULT, 'F', 'female'), (DEFAULT, 'M', 'male'); That way you can expand the table as new values for gender become necessary. In your USER (or whatever) table, you store static_gender_id and get the value for the gender with a JOIN.

What data type can be used to store gender?

Show activity on this post. Per the standard, the column should be called "Sex" and the 'closest' data type would be tinyint with a CHECK constraint or lookup table as appropriate.

Is gender a Boolean data type?

Any piece of information that can have two and only two choices should be declared as data type Boolean. For example, GENDER should be Boolean because there are only two choices.


There is already an ISO standard for this; no need to invent your own scheme:

http://en.wikipedia.org/wiki/ISO_5218

Per the standard, the column should be called "Sex" and the 'closest' data type would be tinyint with a CHECK constraint or lookup table as appropriate.


I'd call the column "gender".

Data Type   Bytes Taken          Number/Range of Values
------------------------------------------------
TinyINT     1                    255 (zero to 255)
INT         4            -       2,147,483,648 to 2,147,483,647
BIT         1 (2 if 9+ columns)  2 (0 and 1)
CHAR(1)     1                    26 if case insensitive, 52 otherwise

The BIT data type can be ruled out because it only supports two possible genders which is inadequate. While INT supports more than two options, it takes 4 bytes -- performance will be better with a smaller/more narrow data type.

CHAR(1) has the edge over TinyINT - both take the same number of bytes, but CHAR provides a more narrow number of values. Using CHAR(1) would make using "m", "f",etc natural keys, vs the use of numeric data which are referred to as surrogate/artificial keys. CHAR(1) is also supported on any database, should there be a need to port.

Conclusion

I would use Option 2: CHAR(1).

Addendum

An index on the gender column likely would not help because there's no value in an index on a low cardinality column. Meaning, there's not enough variety in the values for the index to provide any value.


In medicine there are four genders: male, female, indeterminate, and unknown. You mightn't need all four but you certainly need 1, 2, and 4. It's not appropriate to have a default value for this datatype. Even less to treat it as a Boolean with 'is' and 'isn't' states.


An Int (or TinyInt) aligned to an Enum field would be my methodology.

First, if you have a single bit field in a database, the row will still use a full byte, so as far as space savings, it only pays off if you have multiple bit fields.

Second, strings/chars have a "magic value" feel to them, regardless of how obvious they may seem at design time. Not to mention, it lets people store just about any value they would not necessarily map to anything obvious.

Third, a numeric value is much easier (and better practice) to create a lookup table for, in order to enforce referential integrity, and can correlate 1-to-1 with an enum, so there is parity in storing the value in memory within the application or in the database.


Option 3 is your best bet, but not all DB engines have a "bit" type. If you don't have a bit, then TinyINT would be your best bet.