I am trying to design an ecommerce web application in MySQL and I am having problems choosing the correct primary keys for the user table. the example given is just a sample example for illustration.
user table have following definition
CREATE TABLE IF NOT EXISTS `mydb`.`user` (
`id` INT NOT NULL ,
`username` VARCHAR(25) NOT NULL ,
`email` VARCHAR(25) NOT NULL ,
`external_customer_id` INT NOT NULL ,
`subscription_end_date` DATETIME NULL ,
`column_1` VARCHAR(45) NULL ,
`column_2` VARCHAR(45) NULL ,
`colum_3` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) ,
UNIQUE INDEX `username_UNIQUE` (`username` ASC) ,
UNIQUE INDEX `email_UNIQUE` (`email` ASC) ,
UNIQUE INDEX `customer_id_UNIQUE` (`external_customer_id` ASC) )
ENGINE = InnoDB
I am facing following issues with the primary key candidate columns:
Id column
Pros
cons
email column
Pros
Cons
username column
Pros
Cons
external_customer column
pros
can be used as an external reference for a customer and holds no information (maybe non-editable username can be used instead? )
cons
might leaks information if it is auto incremental (if possible)
what are the common practice when choosing user table primary keys for a scalable ecommerce web application? all feedback appreciated
I don't have anything to say about some of your analysis. If I've cut some of your pros or cons, that only means I don't think I have anything useful to add.
Id column
Pros
- No business meaning (stable primary key)
- faster table joins
- compacter index
First, any column or set of columns declared NOT NULL UNIQUE has all the properties of a primary key. You can use any of them as the target for a foreign key reference, which is what all this is really about.
In your case, your structure allows 4 columns to be targets of a foreign key reference: id, username, email, and external_customer_id. You don't have to use the same one all the time. It might make sense to use id for 90% of your FK references, and email for 10% of them.
Stability doesn't have anything to do with whether a column has business meaning. Stability has to do with how often, and under what circumstances, a value might change. "Stable" doesn't mean "immutable" unless you're running Oracle. (Oracle can't do ON UPDATE CASCADE.)
Depending on your table structure and indexing, a natural key might perform faster. Natural keys make some joins unnecessary. I did tests before I built our production database. It's probably going to be decades before we reach the point that joins on ID numbers will outperform fewer joins and natural keys. I've written about those tests either on SO or on DBA.
You have three other unique indexes. (Good for you. I think at least 90% of the people who build a database don't get that right.) So it's not just that an index on an ID number is more compact than either of those three; it's also an additional index. (In this table.)
email column
Pros
- None
An email address can be considered stable and unique. You can't stop people from sharing email addresses, regardless of whether it's the target for a foreign key reference.
But email addresses can be "lost". In the USA, most university students lose their *.edu email addresses with a year or so of graduation. If your email address comes through a domain that you're paying for, and you stop paying, the email address goes away. I imagine it's possible for email address like those to be given to new users. Whether that creates an unbearable burden is application-dependent.
Cons
- a user should be able to change the email address. Not suitable for primary key
All values in a SQL database can be changed. It's only unsuitable if your environment doesn't let your dbms honor an ON UPDATE CASCADE declaration in a timely manner. My environment does. (But I run PostgreSQL on decent, unshared hardware.) YMMV.
username column
Pros
- a "natural" primary key
- Less table joins
- simpler and more "natural" queries
Fewer joins is an important point. I have been on consulting gigs where I've seen the mindless use of ID numbers made people write queries with 40+ joins. Judicious use of natural keys eliminated up to 75% of them.
It's not important to always use surrogate keys as the target for your foreign keys (unless Oracle) or to always use natural keys as the target. It's important to think.
Cons
- varchar column is slower when joining tables
- an index on a varchar column is less compact than int column index
You can't really say that joining on a varchar() is slower without qualifying that claim. The fact is that, although most joins on varchar() are slower than joins on id numbers, they're not necessarily so slow that you can't use them. If a query takes 4ms with id numbers, and 6ms with varchar(), I don't think that's a good reason to disqualify the varchar(). Also, using a natural key will eliminate a lot of joins, so overall system response might be faster. (Other things being equal, 40 4ms joins will underperform 10 6ms joins.)
I can't recall any case in my database career (25+ years) where the width of an index was the deciding factor in choosing the target for a foreign key.
external_customer column
pros
- can be used as an external reference for a customer and holds no information (maybe non-editable username can be used instead? )
There are actually few systems that let me change my username. Most will let me change my real name (I think), but not my username. I think an uneditable username is completely reasonable.
In general, web applications try to keep their database schema away from the customer - including primary keys. I think you're conflating your schema design with authentication methods - there's nothing stopping you from allowing users to log in with their email address, even if your database design uses an integer to uniquely identify them.
Whenever I've designed systems like this, I've used an ID column - either integer or GUID for the primary key. It's fast, doesn't change due to pesky real life situations, and is a familiar idiom to developers.
I've then worked out the best authentication scheme for the app in hand - most people expect to login with their email address these days, so I'd stick with that. Of course, you could also let them login with their Facebook, Twitter, or Google accounts. Has nothing to do with my primary key, though...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With