Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database schema - representing location

Tags:

I'll need to represent location of some events and I am designing database schema for this application. I have two approaches for presenting the location:

approach 1: 4 tables:

  • Countries
  • States
  • Cities
  • Locations (in location I have foreign key to country_id, state_id and city_id)

approach 2: 1 table:

  • Locations and there are simply fields country, state, city which are stored as text (no foreign id's)

Which approach would you recommend? the first one will help to eliminate possible different names of e.g. the same country (usa, us, united states etc.) and could be helpful in providing suggestions when writing in textboxes which probably will be mandatory.

however, the second approach seems like it will allow keep everything much more simple and should reduce number of queries to the database.

Which one do you think is better? Do you know what are best practices in this case? E.g. how did it do some big portals where they also need something like location (e.g. foursquare etc.). Afaik facebook uses the second approach, but... I want to hear your opinions and possibly reasons why would you choose one approach over another.

Thanks!

like image 519
Bart Avatar asked Jul 11 '11 07:07

Bart


People also ask

What are the 3 types of schema in the database?

Schema is of three types: Logical Schema, Physical Schema and view Schema. Logical Schema – It describes the database designed at logical level. Physical Schema – It describes the database designed at physical level. View Schema – It defines the design of the database at the view level.

How do you represent a database schema?

A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data.

What are the 2 main types of database schemas?

There are two main kinds of database schema: A logical database schema conveys the logical constraints that apply to the stored data. It may define integrity constraints, views, and tables. A physical database schema lays out how data is stored physically on a storage system in terms of files and indices.

What should be included in a database schema?

A database schema will include:All important or relevant data. Consistent formatting for all data entries. Unique keys for all entries and database objects. Each column in a table has a name and data type.


1 Answers

Approach #1:

This is a good solution if you want a good normalized database. You can manage all your tables easily but you will have to have 3 left/inner join when you query the location. I assume everything is indexed properly so you won't have really trouble with performance since these tables will be relatively small (country and states) and medium size for Cities (if you only want all cities for a specific country only). If you want ALL the cities in the world that table will be huge and you might have performance issue at some point if you don't index or join the table correctly.

Since everything is in the database, you don't have to change code if you need to add, update or delete a record.

If you need to add, update or delete any records, this solution will be very easy to maintain. If you need to update a name (for example city name) and all records will be updated at once.

Queries will be faster to run if you look by city or state will be fast, then a simple left join to get the name will do the trick.

Approach #2:

I personally would not recommend this because for maintainability it is not the best solution. If someday you need to retrieve data based on a city, your query might be slow to execute if you don't index properly. If you index the country, state, city then it will be faster for lookup (but slower than the first approach since varchar is slower than int for indexing). Also, you increase the risk of errors for the names eg: New York VS newyork VS New Yrok.

Also, if you need to update a name of a city, you will have to retreive all the records that has that name then update all these records. Which can take a long time.

eg: UPDATE locations SET city = 'New York' where city = 'newyork'; *note: also if you have misspells, you will have to validate ALL records to make sure you update all records

Here's a skeleton based on your requirement (using MYSQL) for approach #1:

CREATE TABLE `countries` (   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,   `name` varchar(200) NOT NULL DEFAULT '',   PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;  CREATE TABLE `states` (   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,   `name` varchar(200) NOT NULL DEFAULT '',   `fk_country_id` int(10) NOT NULL DEFAULT '0',   PRIMARY KEY (`id`),   KEY `fk_country_id` (`fk_country_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;  CREATE TABLE `cities` (   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,   `name` varchar(200) NOT NULL DEFAULT '',   `fk_state_id` int(10) NOT NULL DEFAULT '0',   PRIMARY KEY (`id`),   KEY `fk_state_id` (`fk_state_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;  CREATE TABLE `locations` (   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,   `name` varchar(200) NOT NULL DEFAULT '',   `fk_country_id` int(10) NOT NULL DEFAULT '0',   `fk_state_id` int(10) NOT NULL DEFAULT '0',   `fk_cities_id` int(10) NOT NULL DEFAULT '0',   PRIMARY KEY (`id`),   KEY `fk_country_id` (`fk_country_id`),   KEY `fk_state_id` (`fk_state_id`),   KEY `fk_cities_id` (`fk_state_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;  /* This table should not have fk_country_id and fk_state_id since they are already in their respective tables. but for this requirement I will not remove them from the table */  SELECT locations.name AS location, cities.name AS city, states.name AS state, countries.name AS country from locations INNER JOIN cities ON (cities.id = fk_cities_id) INNER JOIN states ON (states.id = locations.fk_state_id) INNER JOIN countries ON (countries.id = locations.fk_country_id); +-------------------+---------------+----------+---------------+ | location          | cty          | state    | country       | +-------------------+---------------+----------+---------------+ | Statue of Liberty | New York City | New York | United States | +-------------------+---------------+----------+---------------+ 1 row in set (0.00 sec)  EXPLAIN: +----+-------------+-----------+--------+----------------------------------------+---------+---------+-------+------+-------+ | id | select_type | table     | type   | possible_keys                          | key     | key_len | ref   | rows | Extra | +----+-------------+-----------+--------+----------------------------------------+---------+---------+-------+------+-------+ |  1 | SIMPLE      | locations | system | fk_country_id,fk_state_id,fk_cities_id | NULL    | NULL    | NULL  | 7174 |       | |  1 | SIMPLE      | cities    | const  | PRIMARY                                | PRIMARY | 4       | const |    1 |       | |  1 | SIMPLE      | states    | const  | PRIMARY                                | PRIMARY | 4       | const |    1 |       | |  1 | SIMPLE      | countries | const  | PRIMARY                                | PRIMARY | 4       | const |    1 |       | +----+-------------+-----------+--------+----------------------------------------+---------+---------+-------+------+-------+ 

Now update:

UPDATE states SET name = 'New York' WHERE ID = 1; //using the primary for update - we only have 1 New York City record in the DB Query OK, 0 rows affected (0.00 sec) Rows matched: 1  Changed: 1  Warnings: 0 

Now if I look all my locations for that city, all will say: New York

For approach #2:

CREATE TABLE `locations` (   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,   `name` varchar(200) NOT NULL DEFAULT '',   `fk_country_id` varchar(200) NOT NULL default '',   `fk_state_id` varchar(200) NOT NULL default '',   `fk_cities_id` varchar(200) NOT NULL default '',   PRIMARY KEY (`id`),   KEY `fk_country_id` (`fk_country_id`),   KEY `fk_state_id` (`fk_state_id`),   KEY `fk_cities_id` (`fk_state_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;   SELECT location, city, state, country FROM locations; +-------------------+---------------+----------+---------------+ | location          | city          | state    | country       | +-------------------+---------------+----------+---------------+ | Statue of Liberty | New York City | New York | United States | +-------------------+---------------+----------+---------------+ 

Now update:

UPDATE locations SET name = 'New York' WHERE name = 'New York City'; // can't use the primary key for update since they are varchars Query OK, 0 rows affected (1.29 sec) Rows matched: 151  Changed: 151  Warnings: 0 

Now if I look all my locations for that city, NOT all will say: New York

As you can see, it took 1.29 seconds (yes it's fast) but all records that has "New York" was updated but maybe there's some misspells or bad names etc...

Conclusion: For this reason only, I rather go with the first approach.

Note: Country and States rarely change. Maybe you can have these in your code and don't reference them from the database. This will save 2 INNER JOIN from the query and them in your code you simply retreive the ID of the country or state (same thing if you need to create an HTML drop down box).

Also, you can think about caching these countries and states using like memcached, APC, reddis or any other you prefer.

like image 180
Book Of Zeus Avatar answered Oct 18 '22 16:10

Book Of Zeus