Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating test data - how to generate a valid address for a given US zipcode?

I am creating a tool which depends on addresses. For the purposes of testing, I'd like to create a large number of valid US addresses. I have the GeoNames postal code data and I would like to generate some number of real addresses for each of the ~41,000 zip codes in the United States.

I've found sites like FakeAddressGenerator and FakeName which claim to generate random, valid US addresses. How do these sites work? How can I do the same thing without relying on scraping these websites?

Ideally, I'd like to be able to do this in Python; utilizing a web service is fine (it doesn't seem that either FakeAddressGenerator or FakeName provide such a web service).

Thanks!

like image 991
Joseph Avatar asked Apr 03 '18 20:04

Joseph


People also ask

How do I add a postal code to an address?

Place the recipient's name on the first line. On the second line, write the building number and street name. Include the city, state and ZIP code on the final line.


2 Answers

Googling your issue I found 2 links of interest:

  1. https://github.com/EthanRBrown/rrad that provides approximately 3200 real anonymised addresses.
  2. https://openaddresses.io that also has a link to their open source github with the complete data set.

I don't recommend scraping the fake address generators as they do not guarantee existence. I would not go sampling in google maps either as you will surely get blacklisted.

Extracting data from downloaded zip file in 2 is easy: they are zip files containing csv files with full address, zip, lat, lon, etc...

The two above data sets "guarantee" the existence of the address. I don't know how hard your other conditions are, namely having at least one valid address for each of the 41k zip codes. If this is a hard constraint, I doubt you will get such data set open source.


EDIT:

If you have a list of all postcodes in the US, a fully automatable solution is by using a service called nominatim of openstreetmap(subject to their TOCs!)

1) get the lat, lon (centre point or default address) of each post code:

https://nominatim.openstreetmap.org/search/?format=xml&addressdetails=1&limit=1&country_codes=us&postalcode=35051

2) get the related address of this lat, lon:

https://nominatim.openstreetmap.org/reverse?format=xml&lat=33.178764&lon=-86.619038&zoom=18&addressdetails=1

trying this example for Columbiana in Alabama (postcode 35051) yields 397 West College Street.

Nominatim documentation is at: https://wiki.openstreetmap.org/wiki/Nominatim

like image 191
Lynx-Lab Avatar answered Oct 12 '22 11:10

Lynx-Lab


You can install random-address:

pip install random-address

And then use random_address.real_random_address_by_postal_code:

>>> import random_address
>>> random_address.real_random_address_by_postal_code('32409')
{'address1': '711 Tashanna Lane', 'address2': '', 'city': 'Southport', 'state': 'FL', 'postalCode': '32409', 'coordinates': {'lat': 30.41437699999999, 'lng': -85.676568}}
like image 22
neosergio Avatar answered Oct 12 '22 10:10

neosergio