I am trying to search within Google Cache, so I need to fire this query:
http://webcache.googleusercontent.com/search?q=cache:news.ycombinator.com/news+hacker+news
And get some content like timestamp from the page. But when I do this using curl (ruby), it gives a permission denied error, i.e. scraping is blocked and that was expected.
So, is there any way to search google cache (either an API or some kind of workaround scraping) and extract some information like timestamp?
Google Search Click that, and a menu will appear with a single option: “Cached.” Click that link to see a cached version of the page. You'll see a banner at the top with the date and time the snapshot was taken and a link to access the current page. Another simple method is to type “cache:URL” in the search bar.
Detecting the Cache API In modern browsers, each origin has a cache storage and we can inspect it by opening the browser developer tools: On Chrome: Application > Cache > Cache Storage.
The cache: operator is a search operator that you can use to find the cached version of a page. Google generates a cached version so that users can still access the web page, for example, if the site isn't available. The cache: operator is only available on web search.
I didn't get any API but I can scrape it using hpricot or nokogiri in rails (curl in Rails gives permission denied error). I will put up the code once I figure out how to extract the time stamp from the above URL using these gems.
Any one has a better solution?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With