I am rebuilding the background system of a site with a lot of traffic.
This is the core of the application and the way I build this part of the database is critical for a big chunk of code and upcoming work. The system described below will have to run millions of times each day. I would appreciate any input on the issue.
The background is that a user can add what he or she has been eating during the day.
Simplified, the process is more or less this:
id, user_id, date, type, food_data
.What I currently have is a huge database with food items from which the search will be performed. The food items are stored with information on both the common name (like "pork cutlets") and on producer (like "coca cola"), along with other detailed information needed.
Question summary:
My problem is that I do not know the best way to store the data for it to be easily accessible in the way I need it and without the database going out of hand.
Consider 1 million users adding 1 to 7 meals each day. To store each food item for each meal, each day and each user would potentially create (1*avg_num_meals*avg_num_food_items) million rows each day.
Storing the data in some compressed way (like the food_data is an json_encoded string), would lessen the amount of rows significally, but at the same time making it hard to create the 'most used food items'-list and other statistics on the fly.
Should the table be split into several tables? If this is the case, how would they interact?
The site is currently hosted on a mid-range CDN and is using a LAMP (Linux, Apache, MySQL, PHP) backbone.
Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.
So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.
This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.
This normalized table structure will support a VERY large number of records and support a large number of queries against the database.
First,
Storing the data in some compressed way (like the food_data is an json_encoded string)
is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.
You should definitely have a few tables here.
Users
id, etc
Food Items
id, name, description, etc
Meals
id, user_id, category, etc
Meal Items
id, food_item_id, meal_id
The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With