Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it *really* worth to use integer over varchar for a set of data?

For example if I have a table User, I want to store gender or sex, I'll add a column like sex.

Is it really worth to use an integer and then map it in my favorite programming language?

Like 1 => 'Male' and 2 => 'Female'

Is there any performance reason to do that?

Or could I safely use a varchar which more meaning with 'female' or 'male' almost like I was using mysql ENUM ?

Edit: I here and there that it is sometimes better, sometimes it doesn't matter, so I more looking for benchmark or something over a "it is better" answer.

I mean I think using varchar is actually more meaningfull than an integer, and I would use an integer only if performance are more than 0.3% or something.

like image 632
Trent Avatar asked Sep 24 '11 12:09

Trent


People also ask

Should I use VARCHAR or int?

1 Answer. Int comparisons are faster than varchar comparisons, and ints take much less space than varchars. This is applicable true for both unindexed and indexed access. You can use an indexed int column to make it faster.

What is the difference between VARCHAR and integer?

Integer is for numbers, and varchar is for numbers, letters and other characters (Short text). So for age you can use a int type, for genders you can use the enum() type if there are only two options.

Can we use integer in VARCHAR?

A VARCHAR declaration must include a positive integer in parentheses to define the maximum allowable character string length. For example, VARCHAR(n) can accept any length of character string up to n characters in length. The length parameter may take any value from 1 to the current table page size.

What is the advantage of text over VARCHAR?

Some Differences Between VARCHAR and TEXT The VAR in VARCHAR means that you can set the max size to anything between 1 and 65,535. TEXT fields have a fixed max size of 65,535 characters. A VARCHAR can be part of an index whereas a TEXT field requires you to specify a prefix length, which can be part of an index.


2 Answers

If this is for some homebrew website or application that will serve 10 people, then do whatever you want, performance won't make a difference.

If this is for something real then skip rolling your own implementation of gender and follow the ISO standard for sex. Or at least adhere to standards wherever they exist (thanks Joe Celko!)

0 = not known
1 = male
2 = female
9 = not applicable

Always rightsize your data type

  • Disk space savings:
    At my last job, the pedantic people in charge of designing tables created a column as decimal with 0 precision because it should only be N digits. The difference in storage cost between that and a whole number data type was 1 or 2 bytes. However, as this table was very large the aggregate cost savings of having the smaller data type was measure in gigabytes on the table alone.

  • Access savings: A second cost that most don't think about is the cost to read information from disk or to keep data in memory. In SQL Server, data is stored in 8K pages. If you are using fat data types, it will take more reads to get data off disk and then you can store subsequently fewer data pages in memory. Pulling data off of disk is where you will incur the biggest performance cost. If you want to speed up things that use a database, don't bone the physical implementation.

Implement as the smallest allowable type in your system that will cover the problem domain. For something like gender, use a tinyint (MySQL, SQL Server) or number(5,0) in Oracle and you'll be spending 1 bye of storage for each gender.

Internationlization

M = Male, F = Female, that seems obvious. ¿Verdad? Aqui, nosotros hablamos español. And that's about as far as my Spanish caries me, but my point is that if you ever need to be multi-lingual, 1 will serve males, gentes, mannlich, masculin, etc. M or Male will only serve an English speaking audience. Further more, then you run into weird presentation logic of "We need to translate everything unless it's going to $culture." It is a far cleaner design to have presentation logic is the UI and keep it out of the database.

like image 100
billinkc Avatar answered Oct 12 '22 23:10

billinkc


Ortiginal Answer:
I would suggest storing it in a CHAR(1) column as M or F
It is expressive enough for the specific purpose AND has the speed benefit of being a single character comparison

Update 4 (fixed benchmark):
All previous benchmarks had a fatal flaw that one (the CHAR(1)) table was MyISAM and all other were InnoDB. So I recreated the database with all tables using the MyISAM and the results make much more sense now.

The error creeped in as I used the MySQLWorkbench's wizard to create the tables and forgot to change the database engine in the other tables and it defaulted to InnoDB (I have MySQL 5.5)

So the corrected results are as follows, (I have removed all my previous benchmarks as they were invalid) :

// select queries
$query['char'] = "select count(*) from test_table where gender = 'M'";
$query['char_lower'] = "select count(*) from test_table where LOWER(gender) = 'm'";
$query['varchar'] = "select count(*) from test_table_2 where gender = 'Male'";
$query['varchar_lower'] = "select count(*) from test_table_2 where LOWER(gender) = 'male'";
$query['tinyint'] = "select count(*) from test_table_3 where gender = 1";

// benchmark result
array
  'char' => float 0.35457420349121
  'char_lower' => float 0.44702696800232
  'varchar' => float 0.50844311714172
  'varchar_lower' => float 0.64412498474121
  'tinyint' => float 0.26296806335449

New Conclusion : TINYINT Is fastest. But my recommendation would be still yo use CHAR(1) as it would be easier for future developers to understand the database.

If you do use TINYINT, my recommendation would be name the column ismale instead of sex and store 0 => Female and 1 => male thus making it a little more easy to understand in raw database.

The table structure for benchmark is this:

CREATE TABLE `test_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `gender` char(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM

Only the type of the gender column is different in the 3 tables, the types are:

CHAR(1), VARCHAR(6), TINYINT

All 3 tables have 10000 entries.

like image 32
danishgoel Avatar answered Oct 12 '22 23:10

danishgoel