What is the correct column type for holding ZipCode
values in PostgreSQL
database?
I strongly disagree with the advice presented here.
- The accepted answer accepts things that aren't digits.
- The question is about Zip Codes, not postal codes.
- If we assume the post is wrong and means international postal codes, there are characters that appear in international postal codes that don't appear in that list, and many international - and also the US domestic - postal codes can be over ten characters
- If we actually answer the question they asked, about zip codes, then there should be no accomodation for anything but digits (and arguably the hyphen)
- US zip codes can be up to 11 digits long (13 characters counting the two dashes) - there is a zip, a zip+4, and a zip+6 (which programmers would call zip+4+2) notation; the last is used by skyscrapers, universities, et cetera
- US zip codes are always non-negative integers, and therefore should not be stored as text data, which is subject to non-canon representation problems (ask anyone who's done a system about that time they found out that their zip 00203 didn't match the zip 203 that they accidentally got when constantly unnecessarily parsing string representations)
- If you pretend you're actually tracking international post codes, the short character sequence limited text fields here don't even begin to do the job. The word "China" comes to mind.
My opinon:
- Decide whether you're actually handling US postal codes or international
- If you're handling US postal codes, track them as unsigned integers, and left-pad them with zeros when text representing them. (Think unix timestamps and local TZ representations if you need to understand why this will be simpler in the long run.)
- If you're handling international post codes, store them in an unbounded unicode string, tie them to the country they represent, and validate country by country with check constraints. This problem is far more difficult than it sounds up front. International addresses are some of the least standardized things on Earth. Wait'll you find out how Japanese house numbers work, or why the British postal 6-code has the gaps it has.
It is something like xxxxx-xxxx, so varchar(10)
is recommended.
If you want to check the syntax of the values in the database, you could create a domain
type for zip codes.
CREATE DOMAIN zipcode varchar(10)
CONSTRAINT valid_zipcode
CHECK (VALUE ~ '[A-Z0-9-]+'); -- or a better regular expression
You could have a look at this site, which proposes this regex:
(^\d{5}(-\d{4})?$)|(^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$)
But you should check it works for the PostgreSQL
regex syntax.