I have a Zip Code MySQL database that has the lat/lng of each zip code. How do I go about finding the center zip code for each state? I would be doing this in PHP.
ZIP codes are not as geographically scattered as area codes in the US, but they are not a coordinate system. The only exception is that the ZIP+4 codes are sub-sections of the larger ZIP code. You can assume that any ZIP+4 codes that have the same ZIP code are close to each other.
The ZIP Code Population Weighted Centroids allows researchers and analysts to estimate the center of population in a given geography rather than the geometric center. Data to estimate ZIP code centroids is extracted from administrative USPS address data.
ZIP Codes are numbered with the first digit representing a certain group of U.S. states, the second and third digits together representing a region in that group (or perhaps a large city) and the fourth and fifth digits representing a group of delivery addresses within that region.
Assuming that you wish to treat the coordinates of each zip code as equally weighted points within each state, and find the ZIP code closest to the effective "centre of mass", you can do it directly in MySQL by combining:
How to calculate geographical centre;
the Finding Locations with MySQL section of Google's article on Creating a Store Locator with PHP, MySQL & Google Maps; and
the general technique for obtaining a groupwise minimum/maximum.
The complete query is:
-- the closest ZIP code
SELECT state, z.zip_code
FROM zip_codes z JOIN (
-- the distance between the "centre of mass" and the closest ZIP code
SELECT state, c.lat, c.lon
MIN(ACOS(
COS(c.lat) * COS(RADIANS(z.lat)) * COS(RADIANS(z.lon) - c.lon)
+ SIN(c.lat) * SIN(RADIANS(z.lat))
)) min
FROM zip_codes z JOIN (
-- the "centre of mass" of each state
SELECT state,
ATAN2(
SUM(SIN(RADIANS(lat))) / COUNT(*),
SQRT(
POW(SUM(COS(RADIANS(lat)) * SIN(RADIANS(lon))) / COUNT(*), 2)
+ POW(SUM(COS(RADIANS(lat)) * COS(RADIANS(lon))) / COUNT(*), 2)
)
) AS lat,
ATAN2(
SUM(COS(RADIANS(lat)) * SIN(RADIANS(lon))) / COUNT(*),
SUM(COS(RADIANS(lat)) * COS(RADIANS(lon))) / COUNT(*)
) AS lon
FROM zip_codes
GROUP BY state
) c USING (state)
GROUP BY state
) d USING (state)
WHERE ACOS(
COS(d.lat) * COS(RADIANS(z.lat)) * COS(RADIANS(z.lon) - d.lon)
+ SIN(d.lat) * SIN(RADIANS(z.lat))
) = d.min
This could prove pretty slow, as indexing (other than on the state
column) will not be of much use, but then again it's a one-time only operation so cacheing the result shouldn't prove too much of a problem.
There are many ZIP codes in densely populated areas, and few in sparsely populated areas. As a result, the determined "centre of mass" may be some distance away from the geographical centre (but it could be a reasonable proxy for the population centre, if that's what is desired).
Adding a suitable weighting to each ZIP code would yield reasonable approximations: e.g. weight by the total land area each ZIP code covers to find geographical centre; or by population resident within each ZIP code to find actual population centre.
The only way to have a true geographical centre would be to derive the centroid from each state's borders. You can download the coordinates of suitable bounding polygons from the CloudMade Downloads site.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With