I am wondering if there is some sort of "standard" for storing US addresses in a database? It seems this is a common task, and there should be some sort of a standard.
What I am looking for is a specific schema of how the database tables should work and interact, already in third normal form, including data types (MySQL). A good UML document would work.
Maybe I'm just being lazy, but this is a very common task, and I am sure someone has published an efficient way to do this somewhere. I just don't know where to look and Google isn't helping. Please point me to the resource. Thanks.
EDIT
Although this is more of a general question, I would like to clarify my specific needs.
Addresses will be used to specify road addresses of locations of events. These addresses will need to be in a format that can be best broken down and searched, and also used by any third-party applications I may end up linking my data source to.
ALSO. Data will be geo-coded (long, lat) on entry and stored separately, so it must fit the (yet undecided) protocol of whatever geocoder / application / library does that.
String data types are normally used to store names, addresses, descriptions or any value that contains letters and numbers including binary data, like image or audio files.
An address is stored in a compound type known as a pointer type.
DB provides data integrity between the file and its metadata. Database Security is available by default. Backups automatically include files, no extra management of file system necessary. Database indexes perform better than file system trees when more number of items are to be stored.
For international addresses, refer to the Universal Postal Union's Postal Addressing Systems database.
For U.S. addresses, refer to USPS Publication 28 "Postal Addressing Standards".
The USPS wants the following unpunctuated address components concatenated on a single line:
N
, SE
, etc.)AVE
, BLVD
, etc.)SW
, E
, etc.)APT
, STE
, etc.)E.g. 102 N MAIN ST SE APT B
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE
is the street EAST
as in S EAST LN
or is it LANE
as in SE LANE ST
?).
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT
and STE
designate the address, or it's a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102
), or it's a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won't want on the address line).
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it's common for each address to have a different ZIP code.
It's quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
DR
, MS
, etc)III
, PHD
, etc)I typically print mail stops somewhere between the person's name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.
First, as a person who spend most of there professional day working with addresses, they are hard to manage from a data perspective.
If you ask 5 people what address they live at; you will find that you get 5 different answers. While you and I can tell that 123 Main Street Apt 1 and Apt 1 123 Main Street are the same address, the database program will have a challenge.
If you are using United States centric addresses CASS certified software from almost any vendor will standardize your addresses reasonably well. I would recommend a simple format as follows:
However, if you want a universal address I would look at the ADIS standard from IdeaAlliance. This standard can be used to breakdown (parse) addresses from almost any country into the relevant parts. Then they can be put back together using templates/components based on the Universal Postal Union standards (UPU S42 Standard on International Postal Address Components and Templates).
The big plus of this format is that addresses that dont exist in a postal database like CASS can be entered and stored as separate parts.
Very similar questions have been asked before.
Addresses are messy - at best.
It partly depends on what you want to do with the addresses. If you're going to use them to mail thing to people, then you simply need to record the image that will appear on the address label in a convenient form. If you're going to analyze the address, you have to work a lot harder.
Remember that the first time you have to deal with someone outside the US, all previous rules go astray. You may be strictly US-only, but beware.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With