If you've done any serious research into search API's, you know that most of them have a huge slew of TOS/TOU restrictions that make them nearly impossible to use in anything but the most inane applications.
Bing's 2.0 API, Yahoo Search BOSS, Google Places, Google AJAX Search (dead), et al, are far too restrictive for us. I need to run a finite and relatively small number of queries (perhaps 500k) one time only, storing specific data from the results for use within our application.
For example, we need to match up business names with their target websites (we have written the algorithm to make a 'best guess' from a set of results if necessary; we just need a vanilla result set). Also, we need to match an address to this company in question.
Unfortunately, I can find ZERO search API's that will allow us to fire off queries in a programmatic, non-user-initiated manner.
We're even quite eager to give someone cold, hard cash for access to this kind of data; Google, Bing, Yahoo, and others simply seem to not want our money (as evidenced by their TOSes)...
Any thoughts?
In API Gateway, you can enable caching for a specified stage. When you enable caching, you must choose a cache capacity. In general, a larger capacity gives a better performance, but also costs more. API Gateway enables caching by creating a dedicated cache instance.
Caching enables us to store copies of frequently accessed data in several places along the request-response path. Today, APIs use caching extensively, and it is also one of the architectural constraints of REST APIs.
When you enable caching for a stage, API Gateway caches responses from your endpoint for a specified time-to-live (TTL) period, in seconds. API Gateway then responds to the request by looking up the endpoint response from the cache instead of making a request to your endpoint.
Caching in REST APIs POST requests are not cacheable by default but can be made cacheable if either an Expires header or a Cache-Control header with a directive, to explicitly allows caching, is added to the response. Responses to PUT and DELETE requests are not cacheable at all.
A freely accessible index of 5 billion web pages, their page rank, their link graphs and other metadata, hosted on Amazon EC2.
http://commoncrawl.org/
Their Terms of Service (or TOU) are pretty reasonable and unrestricted too:
http://commoncrawl.org/about/terms-of-use/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With