Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose optimized datatypes for columns [innodb specific]?

I'm learning about the usage of datatypes for databases.

For example:

  • Which is better for email? varchar[100], char[100], or tinyint (joking)
  • Which is better for username? should I use int, bigint, or varchar? Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
  • Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
  • Product price (I use decimal(11,2), don't know about you guys)?
  • Or anything else that you have in mind, like, "I use serial datatype for blablabla".

Why do I mention innodb specifically?

Unless you are using the InnoDB table types (see Chapter 11, "Advanced MySQL," for more information), CHAR columns are faster to access than VARCHAR.

Inno db has some diffrence that I don't know. I read that from here.

like image 458
Adam Ramadhan Avatar asked Jul 20 '10 03:07

Adam Ramadhan


People also ask

What is the most suitable datatype for the column class?

Explanation: Blob is the data type for the given column.

Why CHAR is faster than varchar2?

CHAR will be faster as it is fixed length. For example CHAR(10) and VARCHAR(10) CHAR(10) is a fixed-length string of 10 while VARCHAR is a variable-length string with maximum length of 10. So imagine you have a table with 1,000,000 records and you need to get a record at offset 500,000.

Why is it important to choose proper data type for better performance and storage?

Choosing the right data types for your tables, stored procedures, and variables not only improves performance by ensuring a correct execution plan, but it also improves data integrity by ensuring that the correct data is stored within a database.


2 Answers

Brief Summary:

(just my opinions)

  1. for email address - VARCHAR(255)
  2. for username - VARCHAR(100) or VARCHAR(255)
  3. for id_username - use INT (unless you plan on over 2 billion users in you system)
  4. phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
  5. posts - TEXT
  6. dates - DATE or DATETIME (definitely include times for things like posts or emails)
  7. money - DECIMAL(11,2)
  8. misc - see below

As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.

More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.

More details:

Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):

  • What DataType should I pick?

  • Table design question

  • Enum datatype versus table of data in MySQL?

  • mysql datatype for telephne number and address

  • Best mysql datatype for grams, milligrams, micrograms and kilojoule

  • MySQL 5-star rating datatype?

And a couple questions on when to use the InnoDB engine:

  • MyISAM versus InnoDB

  • When should you choose to use InnoDB in MySQL?

I just use tinyint for almost everything (seriously).

Edit - How to store "posts:"

Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.

Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.

More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.

See bottom for some useful quotes.

  • MySQL Data Type Chart (outdated)

  • MySQL Datatypes (outdated)

  • Chapter 10. Data Types (better details)

  • The BLOB and TEXT Types (1)

  • 11.9. Full-Text Search Functions (2)

  • 10.4.1. The CHAR and VARCHAR Types (3)

(3) "Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.

Before MySQL 5.0.3, if you need a data type for which trailing spaces are not removed, consider using a BLOB or TEXT type.

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.

Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values."

Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:

  • VARCHAR(n) Considered Harmful
like image 131
JohnB Avatar answered Oct 06 '22 08:10

JohnB


There are multiple angles to approach your question.

From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.

From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.

Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.

Finally, there is a number of rules that can help you spot errors in schema design.

For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).

For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).

For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).

like image 39
Isotopp Avatar answered Oct 06 '22 06:10

Isotopp