Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What free/paid search API's allow for programmatic querying and caching/storage of the resulting data?

If you've done any serious research into search API's, you know that most of them have a huge slew of TOS/TOU restrictions that make them nearly impossible to use in anything but the most inane applications.

Bing's 2.0 API, Yahoo Search BOSS, Google Places, Google AJAX Search (dead), et al, are far too restrictive for us. I need to run a finite and relatively small number of queries (perhaps 500k) one time only, storing specific data from the results for use within our application.

For example, we need to match up business names with their target websites (we have written the algorithm to make a 'best guess' from a set of results if necessary; we just need a vanilla result set). Also, we need to match an address to this company in question.

Unfortunately, I can find ZERO search API's that will allow us to fire off queries in a programmatic, non-user-initiated manner.

We're even quite eager to give someone cold, hard cash for access to this kind of data; Google, Bing, Yahoo, and others simply seem to not want our money (as evidenced by their TOSes)...

Any thoughts?

like image 846
rinogo Avatar asked Aug 31 '11 23:08

rinogo


People also ask

Does API gateway have caching?

In API Gateway, you can enable caching for a specified stage. When you enable caching, you must choose a cache capacity. In general, a larger capacity gives a better performance, but also costs more. API Gateway enables caching by creating a dedicated cache instance.

What is API caching?

Caching enables us to store copies of frequently accessed data in several places along the request-response path. Today, APIs use caching extensively, and it is also one of the architectural constraints of REST APIs.

Does API gateway cache responses?

When you enable caching for a stage, API Gateway caches responses from your endpoint for a specified time-to-live (TTL) period, in seconds. API Gateway then responds to the request by looking up the endpoint response from the cache instead of making a request to your endpoint.

CAN REST API be cached?

Caching in REST APIs POST requests are not cacheable by default but can be made cacheable if either an Expires header or a Cache-Control header with a directive, to explicitly allows caching, is added to the response. Responses to PUT and DELETE requests are not cacheable at all.


1 Answers

A freely accessible index of 5 billion web pages, their page rank, their link graphs and other metadata, hosted on Amazon EC2.

http://commoncrawl.org/

Their Terms of Service (or TOU) are pretty reasonable and unrestricted too:

http://commoncrawl.org/about/terms-of-use/

like image 187
seanieb Avatar answered Jan 04 '23 03:01

seanieb