Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zip Code Demographics in R

Tags:

r

census

I could get at my goals "the long way" but am hoping to stay completely within R. I am looking to append Census demographic data by zip code to records in my database. I know that R has a few Census-based packages, but, unless I am missing something, these data do not seem to exist at the zip code level, nor is it intuitive to merge onto an existing data frame.

In short, is it possible to do this within R, or is my best approach to grab the data elsewhere and read it into R?

Any help will be greatly appreciated!

like image 753
Btibert3 Avatar asked Jun 01 '11 00:06

Btibert3


People also ask

What is zip code r?

{zipcodeR} is an R package that makes working with ZIP codes in R easier. It provides data on all U.S. ZIP codes using multiple open data sources, making it easier for social science researchers and data scientists to work with ZIP code-level data in data science projects using R.

What is a valid US zip code?

The current postal codes in the United States range from 00001 – 99950.


2 Answers

In short, no. Census to zip translations are generally created from proprietary sources.

It's unlikely that you'll find anything at the zipcode level from a census perspective (privacy). However, that doesn't mean you're left in the cold. You can use the zipcodes that you have and append census data from the MSA, muSA or CSA level. Now all you need is a listing of postal codes within your MSA, muSA or CSA so that you can merge. There's a bunch online that are pretty cheap if you don't already have such a list.

For example, in Canada, we can get income data from CRA at the FSA level (the first three digits of a postal code in the form A1A 1A1). I'm not sure what or if the IRS provides similar information, I'm also not too familiar with US Census data, but I imagine they provide information at the CSA level at the very least.

If you're bewildered by all these acronyms:

  1. MSA: http://en.wikipedia.org/wiki/Metropolitan_Statistical_Area
  2. CSA: http://en.wikipedia.org/wiki/Combined_statistical_area
  3. muSA: http://en.wikipedia.org/wiki/Micropolitan_Statistical_Area
like image 136
Brandon Bertelsen Avatar answered Nov 11 '22 21:11

Brandon Bertelsen


As others in this thread have mentioned, the Census Bureau American FactFinder is a free source of comprehensive and detailed data. Unfortunately, it’s not particularly easy to use in its raw format.

We’ve pulled, cleaned, consolidated, and reformatted the Census Bureau data. The details of this process and how to use the data files can be found on our team blog.

None of these tables actually have a field called “ZIP code.” Rather, they have a field called “ZCTA5”. A ZCTA5 (or ZCTA) can be thought of as interchangeable with a zip code given following caveats:

  • There are no ZCTAs for PO Box ZIP codes - this means that for 42,000 US ZIP Codes there are 32,000 ZCTAs.
  • ZCTAs, which stand for Zip Code Tabulation Areas, are based on zip codes but don’t necessarily follow exact zip code boundaries. If you would like to read more about ZCTAs, please refer to this link. The Census Bureau also provides an animation that shows how ZCTAs are formed.
like image 3
kstern Avatar answered Nov 11 '22 22:11

kstern