I am wondering what would be the best approach to store, for let's say languages
in a user
table when the user can have as many langauge as he wishes, and hopefully without using serialized data as this field will be searched intensivly.
I was thinking limtating the number of entries, for exemple maximun 4 language and in the user table have lang1, lang2 ..
Is there a better way to achieve this ?
We can use varchar(<maximum_limit>) . The maximum limit that we can pass is 65535 bytes.
Make one table for this purpose, with two columns, where the first column identifies a name for the value, and the second column is the actual value. Let the data type of the value be string. This is a minor disadvantage when you actually need a boolean, but that way your table is more flexible.
The MEDIUMTEXT data object is useful for storing larger text strings like white papers, books, and code backup. These data objects can be as large as 16 MB (expressed as 2^24 -1) or 16,777,215 characters and require 3 bytes of overhead storage.
Storing the Data in the DatabaseAll that's required by the PHP script is a standard INSERT query, using the binary data for the file column value. To grab the file data to be used in the query, you must call the PHP file_get_contents() function, which reads a file into a string: $file_data = file_get_contents($file);
It's called database normalization. Specifically you need to map a "Many to Many" association
You need 3 tables.
User(id, name)
Language (id, language_name)
User_Language(id,id_user,id_language)
To get all the language for a user id 3:
SELECT l.language_name
FROM User u
JOIN user_language ul ON (u.id=ul.id_user)
JOIN Language l ON (l.id = ul.id_language)
WHERE u.id = 3
EDIT:
Two things are important to notice @silkAdmin. The first one, as @BryceAtNetwork23 noted, there's no need to put an id on the User_Language table. The second is that, you should learn about joins, specially MySQL Joins (becouse the SQL tends to differ in different DB engines). After you dig a little bit more you will be able to see that joining the User table in the previous query is also not needed, that could be simplified as:
SELECT l.language_name
FROM user_language ul
JOIN Language l ON (l.id = ul.id_language)
WHERE ul.user_id = 3
But I added it in the first answer to make things easier to you.
Why using the Language table
My answer just reflects the way I'd do it. There are plenty of ways to acomplish what've asked for. Said that, i explain myself.
Let's think in extremes. The first extreme is to store the languages in the user table, as you said above. For example, we can have a column and separate the values with a semicolon. Something like this
User: (1, "John", "spanish;english;japanese")
The advantage of that is that you won't need any join. Given the id of your user you can get the languages. The disadvantages is that it will be really painful to search on that. How you get all your users with language "Spanish"? (The bottom line here is that you can't index your data). Another disadvantage, that is kind of old now, is the overuse of disk space. In the time when the DBs and Normalization was invented, disk space was really costly. So, storing this:
User: (1, "John", "spanish;english;japanese")
User: (2, "Mary", "spanish;english")
That was somthing that couldn't be tolerated. So, some guy came and say: "Hey, let's use ids, so, we can turn it into":
User: (1, "John", "1;2;3")
User: (2, "Mary", "1;2")
Language (1,"spanish")
Language (2,"english")
For 10.000 users and just a few hundred of languages, that's a huge improvement on disk usage (maybe in our time, this is not true anymore, and i'll come to that later). That solved the disk problem, but we still has the search problem. Again, How you get all your users with language "Spanish"? Well, with this design, you should iterate over the users table and get the language column, split it between ";" and look for the id 1.
That's why we started using the approach I showed you before.
So, so far so good. Pretty good explanation ;)
Big disclaimer
As I said before, there are several ways to do this. It depends on your case and what do you want to achive. If you want to search in terms of that column (give me users that speak english, for example) you should consider the design i told you at the top of my answer.
Right now there are a "new wave" of data solutions that are called no-sql databases (it varies) that try to denormalize data. If you're concerned about the over-normalization of your schemas, you should take a look at that. I recommend you MongoDB and CouchDB, becouse those are the easier to start with.
About joins
Don't worry about the performance of 2 joins. If you've performance issues it's not for this. DB engines are created with this purpose. With a good memory cache and index optimization it should work smoothly.
Yes, the best way is to use an additional table with columns lang_id
and user_id
. There you can store any number of user/language associations (one per row).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With