I'm learning about the usage of datatypes for databases. For example: <ul> <li>Which is better for email? varchar[100], char[100], or tinyint (joking)</li> <li>Which is better for username? should I use int, bigint, or varchar? Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.</li> <li>Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.</li> <li>Product price (I use decimal(11,2), don't know about you guys)?</li> <li>Or anything else that you have in mind, like, "I use serial datatype for blablabla".</li> </ul> Why do I mention innodb specifically? <blockquote> Unless you are using the InnoDB table types (see Chapter 11, "Advanced MySQL," for more information), CHAR columns are faster to access than VARCHAR. </blockquote> Inno db has some diffrence that I don't know. I read that from here.

Brief Summary: (just my opinions) <ol> <li>for email address - <code>VARCHAR(255)</code> </li> <li>for username - <code>VARCHAR(100)</code> or <code>VARCHAR(255)</code> </li> <li>for id_username - use <code>INT</code> (unless you plan on over 2 billion users in you system)</li> <li>phone numbers - <code>INT</code> or <code>VARCHAR</code> or maybe <code>CHAR</code> (depends on if you want to store formatting)</li> <li>posts - <code>TEXT</code> </li> <li>dates - <code>DATE</code> or <code>DATETIME</code> (definitely include times for things like posts or emails)</li> <li>money - <code>DECIMAL(11,2)</code> </li> <li>misc - see below</li> </ol> As far as using InnoDB because <code>VARCHAR</code> is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead. More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned. More details: Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for): <ul> <li>What DataType should I pick?</li> <li>Table design question</li> <li>Enum datatype versus table of data in MySQL?</li> <li>mysql datatype for telephne number and address</li> <li>Best mysql datatype for grams, milligrams, micrograms and kilojoule</li> <li>MySQL 5-star rating datatype?</li> </ul> And a couple questions on when to use the InnoDB engine: <ul> <li>MyISAM versus InnoDB</li> <li>When should you choose to use InnoDB in MySQL?</li> </ul> I just use <code>tinyint</code> for almost everything (seriously). Edit - How to store "posts:" Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. <code>CHAR</code> max length is 255, so that's not an option, and of course <code>CHAR</code> would waste unused characters versus <code>VARCHAR</code>, which is variable length <code>CHAR</code>. Prior to MySQL 5.0.3, <code>VARCHAR</code> max length was 255, so you'd be left with <code>TEXT</code>. However, in newer versions of MySQL, you can use <code>VARCHAR</code> or <code>TEXT</code>. The choice comes down to preference, but there are a couple differences. <code>VARCHAR</code> and <code>TEXT</code> max length is now both 65,535, but you can set you own max on <code>VARCHAR</code>. Let's say you think your posts will only need to be 2000 max, you can set <code>VARCHAR(2000)</code>. If you every run into the limit, you can <code>ALTER</code> you table later and bump it to <code>VARCHAR(3000)</code>. On the other hand, <code>TEXT</code> actually stores its data in a <code>BLOB</code> (1). I've heard that there may be performance differences between <code>VARCHAR</code> and <code>TEXT</code>, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future. More importantly, searching this "post" column using a Full-Text Index instead of <code>LIKE</code> would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches. See bottom for some useful quotes. <ul> <li>MySQL Data Type Chart (outdated)</li> <li>MySQL Datatypes (outdated)</li> <li>Chapter 10. Data Types (better details)</li> <li>The BLOB and TEXT Types (1)</li> <li>11.9. Full-Text Search Functions (2)</li> <li>10.4.1. The CHAR and VARCHAR Types (3)</li> </ul> <blockquote> (3) "Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. Before MySQL 5.0.3, if you need a data type for which trailing spaces are not removed, consider using a BLOB or TEXT type. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values." </blockquote> Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue: <ul> <li>VARCHAR(n) Considered Harmful</li> </ul>

How to choose optimized datatypes for columns [innodb specific]?

Tags:

database

mysql

innodb

database-design

I'm learning about the usage of datatypes for databases.

For example:

Which is better for email? varchar[100], char[100], or tinyint (joking)
Which is better for username? should I use int, bigint, or varchar? Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
Product price (I use decimal(11,2), don't know about you guys)?
Or anything else that you have in mind, like, "I use serial datatype for blablabla".

Why do I mention innodb specifically?

Unless you are using the InnoDB table types (see Chapter 11, "Advanced MySQL," for more information), CHAR columns are faster to access than VARCHAR.

Inno db has some diffrence that I don't know. I read that from here.

458

asked Jul 20 '10 03:07

Adam Ramadhan

2 Answers

Brief Summary:

(just my opinions)

for email address - VARCHAR(255)
for username - VARCHAR(100) or VARCHAR(255)
for id_username - use INT (unless you plan on over 2 billion users in you system)
phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
posts - TEXT
dates - DATE or DATETIME (definitely include times for things like posts or emails)
money - DECIMAL(11,2)
misc - see below

As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.

More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.

More details:

Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):

What DataType should I pick?
Table design question
Enum datatype versus table of data in MySQL?
mysql datatype for telephne number and address
Best mysql datatype for grams, milligrams, micrograms and kilojoule
MySQL 5-star rating datatype?

And a couple questions on when to use the InnoDB engine:

MyISAM versus InnoDB
When should you choose to use InnoDB in MySQL?

I just use tinyint for almost everything (seriously).

Edit - How to store "posts:"

Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.

Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.

More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.

See bottom for some useful quotes.

MySQL Data Type Chart (outdated)
MySQL Datatypes (outdated)
Chapter 10. Data Types (better details)
The BLOB and TEXT Types (1)
11.9. Full-Text Search Functions (2)
10.4.1. The CHAR and VARCHAR Types (3)

(3) "Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.

Before MySQL 5.0.3, if you need a data type for which trailing spaces are not removed, consider using a BLOB or TEXT type.

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.

Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values."

Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:

VARCHAR(n) Considered Harmful

131

answered Oct 06 '22 08:10

JohnB

There are multiple angles to approach your question.

From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.

From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.

Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.

Finally, there is a number of rules that can help you spot errors in schema design.

For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).

For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).

For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).

answered Oct 06 '22 06:10

Isotopp

Related questions
                            
                                Is there a way to get last inserted id of a NON - auto incremented column in MySQL?
                            
                                mysql stored procedure is slower 20 times than standard query
                            
                                Best way to manage row expiration in mysql
                            
                                what ip AWS lambda function use?
                            
                                In MySQL, what do I put inside my.cnf so that all tables are UTF-8 that works with emojis by default?
                            
                                AWS RDS Writer Endpoint vs Reader Endpoint
                            
                                How to replace every other instance of a particular character in a MySQL string?
                            
                                How to count words in MySQL / regular expression replacer?
                            
                                MySQL Subquery LIMIT
                            
                                Get last record of each month in MySQL....?
                            
                                Django - OperationalError: (2006, 'MySQL server has gone away')
                            
                                Piping mysqldump to mysql
                            
                                SQL Query with binary data (PHP and MySQL)
                            
                                Python MySQL Connector executing second sql statement within cursor loop?
                            
                                MySQL EXPLAIN EXTENDED filtered column (obviously it's not a percentage)
                            
                                MySQL fulltext search with @ symbol produces error "syntax error, unexpected '@', expecting $end"
                            
                                Mysql slash asterisk bang [duplicate]
                            
                                InnoDB initialization warnings
                            
                                How can I add a foreign key when creating a new table?
                            
                                ini_set, set_time_limit, (max_execution_time) - not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With