Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql table with 40+ columns

I have 40+ columns in my table and i have to add few more fields like, current city, hometown, school, work, uni, collage..

These user data wil be pulled for many matching users who are mutual friends (joining friend table with other user friend to see mutual friends) and who are not blocked and also who is not already friend with the user.

The above request is little complex, so i thought it would be good idea to put extra data in same user table to fast access, rather then adding more joins to the table, it will slow the query more down. but i wanted to get your suggestion on this

my friend told me to add the extra fields, which wont be searched on one field as serialized data.



ERD Diagram:

  • My current table: http://i.stack.imgur.com/KMwxb.png
  • If i join into more tables: http://i.stack.imgur.com/xhAxE.png



Some Suggestions

  1. nothing wrong with this table and columns
  2. follow this approach MySQL: Optimize table with lots of columns - which serialize extra fields into one field, which are not searchable's
  3. create another table and put most of the data there. (this gets harder on joins, if i already have 3 or more tables to join to pull the records for users (ex. friends, user, check mutual friends)
like image 828
Basit Avatar asked Dec 21 '22 11:12

Basit


2 Answers

As usual - it depends.

Firstly, there is a maximum number of columns MySQL can support, and you don't really want to get there.

Secondly, there is a performance impact when inserting or updating if you have lots of columns with an index (though I'm not sure if this matters on modern hardware).

Thirdly, large tables are often a dumping ground for all data that seems related to the core entity; this rapidly makes the design unclear. For instance, the design you present shows 3 different "status" type fields (status, is_admin, and fb_account_verified) - I suspect there's some business logic that should link those together (an admin must be a verified user, for instance), but your design doesn't support that.

This may or may not be a problem - it's more a conceptual, architecture/design question than a performance/will it work thing. However, in such cases, you may consider creating tables to reflect the related information about the account, even if it doesn't have a x-to-many relationship. So, you might create "user_profile", "user_credentials", "user_fb", "user_activity", all linked by user_id. This makes it neater, and if you have to add more facebook-related fields, they won't dangle at the end of the table. It won't make your database faster or more scalable, though. The cost of the joins is likely to be negligible.

Whatever you do, option 2 - serializing "rarely used fields" into a single text field - is a terrible idea. You can't validate the data (so dates could be invalid, numbers might be text, not-nulls might be missing), and any use in a "where" clause becomes very slow.

A popular alternative is "Entity/Attribute/Value" or "Key/Value" stores. This solution has some benefits - you can store your data in a relational database even if your schema changes or is unknown at design time. However, they also have drawbacks: it's hard to validate the data at the database level (data type and nullability), it's hard to make meaningful links to other tables using foreign key relationships, and querying the data can become very complicated - imagine finding all records where the status is 1 and the facebook_id is null and the registration date is greater than yesterday.

Given that you appear to know the schema of your data, I'd say "key/value" is not a good choice.

like image 72
Neville Kuyt Avatar answered Dec 31 '22 08:12

Neville Kuyt


I would advice to run some tests. Try it both ways and benchmark it. Nobody will be able to give you a definitive answer because you have not shared your hardware configuration, sample data, sample queries, how you plan on using the data etc. Here is some information that you may want to consider.

Use The Database as it was intended

A relational database is designed specifically to handle data. Use it as such. When written correctly, joining data in a well written schema will perform well. You can use EXPLAIN to optimize queries. You can log SLOW queries and improve their performance. Databases have been around for years, if putting everything into a single table improved performance, don't you think that would be all the buzz on the internet and everyone would be doing it?

Engine Types

How will inserts be affected as the row count grows? Are you using MyISAM or InnoDB? You will most likely want to use InnoDB so you get row level locking and not table. Make sure you are using the correct Engine type for your tables. Get the information you need to understand the pros and cons of both. The wrong engine type can kill performance.

Enhancing Performance using Partitions

Find ways to enhance performance. For example, as your datasets grow you could partition the data. Data partitioning will improve the performance of a large dataset by keeping slices of the data in separate partions allowing you to run queries on parts of large datasets instead of all of the information.

Use correct column types

Consider using UUID Primary Keys for portability and future growth. If you use proper column types, it will improve performance of your data.

Do not serialize data

Using serialized data is the worse way to go. When you use serialized fields, you are basically using the database as a file management system. It will save and retrieve the "file", but then your code will be responsible for unserializing, searching, sorting, etc. I just spent a year trying to unravel a mess like that. It's not what a database was intended to be used for. Anyone advising you to do that is not only giving you bad advice, they do not know what they are doing. There are very few circumstances where you would use serialized data in a database.

Conclusion

In the end, you have to make the final decision. Just make sure you are well informed and educated on the pros and cons of how you store data. The last piece of advice I would give is to find out what heavy users of mysql are doing. Do you think they store data in a single table? Or do they build a relational model and use it the way it was designed to be used?

When you say "I am going to put everything into a single table", you are saying that you know more about performance and can make better choices for optimization in your code than the team of developers that constantly work on MySQL to make it what it is today. Consider weighing your knowledge against the cumulative knowledge of the MySQL team and the DBAs, companies, and members of the database community who use it every day.

like image 29
Chuck Burgess Avatar answered Dec 31 '22 06:12

Chuck Burgess