Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm to get US zip codes from gis x,y coordinates

I have a database of many tens of thousands of events that occurred at specific geographic locations within the United States. The data include x,y coodinates for each event, encoded using the NAD83 reference system. I want to write or use an algorithm to reliably get the US zip code associated with each NAD83 x,y coordinate.

I do not yet have zip code definitions using the NAD83 reference system. And I have never done this kind of programming before. But it just seems like it would be intuitively simple to find out whether a given x,y coordinate is located within a geometric shape of a US zip code defined using the same NAD83 reference system.

Can anyone help me with the following:
1.) Where do I get reliable US Zip Code definitions in the NAD83 reference system format?
2.) Where can I find example code for an algorithm to find the zip code given an x,y coordinate?

Any links you can send to instructional articles/tutorials, example code, and NAD83 zip code boundary definition data would be really helpful. I am doing google searches, but I figured that people on this site might be able to give me more of an expert's guide.

I code in Java every day. But, if the code you provide is not written in java, I could take code written in another language and adapt it to java for my purposes. I do not have database software installed in my computer because I just use csv or text files as inputs into my java applications. If you have some database that you suggest I use, I would need links to instructions for how to get the data into a format that I can import into a programming language such as java.

Finally, the street addresses in my dataset do not include zip codes, and the street addresses are written haphazardly, so that it would be very difficult to try to clean the address data up enough to try to get zip codes from the addresses. I can isolate the data to several adjacent cities, in perhaps a couple hundred zip codes, but I think that the NAD83 x,y coordinates are my best shot at deriving the zip code in which each event in my dataset occurred. I want to link my resulting zip code by zip code analyses with other data that I get about each zip code from sources like the US Census, etc.

Thank you in advance to anyone who is willing to help.

like image 989
CodeMed Avatar asked Jan 07 '12 02:01

CodeMed


2 Answers

You can use GeoTools in java. Here is a an example the searches for a point in a shapefile.

// projection/datum in SR-ORG:7169 (GCS NAD83)
File shapeFile = new File("zt08_d00.shp");
FileDataStore store = FileDataStoreFinder.getDataStore(shapeFile);
SimpleFeatureSource featureSource = store.getFeatureSource();
// Boulder, CO
Filter filter = CQL.toFilter("CONTAINS(the_geom, POINT(-105.292778 40.019444))");
SimpleFeatureCollection features = featureSource.getFeatures(filter);
for (SimpleFeature f : features) {
    System.out.println(f.getAttribute('NAME'));
}

I grabbed a shapefile from the U.S. Census Bureau's collection of 5-Digit ZIP Code Tabulation Areas from the 2000 Census. I just used a single file for the state of colorado. You would need merge these into a single FeatureSource. Running this outputs 80302 for Boulder, CO.

GeoTools also allow you to convert between projections if needed. Luckily these shapefiles are already in NAD83.

like image 177
JRideout Avatar answered Sep 29 '22 18:09

JRideout


i don't know where to get the ZIP code, but i think you can google it out, the ZIP code of each state.

and to question (2), first you'll need the geographic information, i.e. the boundary of each state. then you just enumerate all the points(x,y) and determine which polygon it's in.

Here is a sample code, it was written for SGU124.

#include <map>
#include <cstdio>
#include <cstring>
#include <algorithm>

#define MAXN 10005

using namespace std;

struct pnt{
    int x,y;
};
struct seg{
    pnt a,b;
}   s[MAXN];
int n;
pnt p;
int h[MAXN<<1];
int k[MAXN<<1];

void work(){
    int i,x,y,c = 0;
    memset(h,0,sizeof(h));
    memset(k,0,sizeof(k));
    for (i=0;i<n;i++){
        if (s[i].a.x<=p.x && p.x<=s[i].b.x && s[i].a.y<=p.y && p.y<=s[i].b.y){
            printf("BORDER\n");
            return;
        }
        if (s[i].a.x==s[i].b.x){
            x = s[i].a.x;
            y = p.y - p.x + x;
            if (x<=p.x && s[i].a.y<=y && y<=s[i].b.y){
                h[x+MAXN] = 1;
                if (y==s[i].a.y) k[x+MAXN] |= 1;
                    else if (y==s[i].b.y) k[x+MAXN] |= 2;
            }
        }
        else{
            y = s[i].a.y;
            x = p.x - p.y + y;
            if (x<=p.x && s[i].a.x<=x && x<=s[i].b.x){
                //printf("%d %d %d %d\n",s[i].a.x,s[i].a.y,s[i].b.x,s[i].b.y);
                h[x+MAXN] = 1;
                if (x==s[i].a.x) k[x+MAXN] |= 4;
                    else if (x==s[i].b.x) k[x+MAXN] |= 8;
            }
        }
    }
    for (i=p.x;i>=-10000;i--){
        //if (h[i+MAXN]>0) printf("@ %d %d\n",i,k[i+MAXN]);
        if (k[i+MAXN]!=9 && k[i+MAXN]!=6) c += h[i+MAXN];
    }
    //printf("p @ %d %d ",p.x,p.y);
    if (c%2) printf("INSIDE\n");
        else printf("OUTSIDE\n");
}

int main(){
    freopen("sgu124.in","r",stdin);
    int i;
    while (~scanf("%d",&n)){
        for (i=0;i<n;i++){
            scanf("%d%d",&s[i].a.x,&s[i].a.y);
            scanf("%d%d",&s[i].b.x,&s[i].b.y);
            if (s[i].a.x>s[i].b.x || s[i].a.y>s[i].b.y) swap(s[i].a,s[i].b);
        }
        scanf("%d%d",&p.x,&p.y);
        work();
        //break;
    }
    return 0;
}
like image 34
iloahz Avatar answered Sep 29 '22 20:09

iloahz