Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is web scraping allowed? [closed]

Tags:

web-scraping

I'm working on a project that requires certain statistics from another website, and I've created an HTML scraper that gets this data every 15 minutes, automatically. However, I stopped the bot now, as in their terms of use, they mention they do not allow it.

I really want to respect this, and especially if there's a law prohibiting me from taking this data, but I've been contacting them through email several times without a single answer, so now I've come to the conclusion that I'll simply grab the data, if it is legal.

On certain forums I've read that it IS legal, but I would much rather get a more "precise" answer here on StackOverflow.

And let's say that this is in fact not illegal, would they have any software to spot my bot making several connections every 15 minutes?

Also, when talking about taking their data, we're talking about a single number for each "team", and this number I will transfer in to our own number.

like image 740
Mikkel Avatar asked Sep 06 '15 23:09

Mikkel


People also ask

Can you get blocked for web scraping?

IP Rotation So, for every successful scraping request, you must use a new IP for every request. You must have a pool of at least 10 IPs before making an HTTP request. To avoid getting blocked you can use proxy rotating services like Scrapingdog or any other Proxy services.

Does Google ban web scraping?

There're no precedents of Google suing businesses over scraping its results pages. Scraping of Google SERPs isn't a violation of DMCA or CFAA. However, sending automated queries to Google is a violation of its ToS. Violation of Google ToS is not necessarily a violation of the law.

Is scraping public websites legal?

Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling.

Is Web scraping API legal?

In short, the action of web scraping isn't illegal. However there are some rules that need to be followed. Web scraping becomes illegal when non publicly available data becomes extracted.


2 Answers

I'll quote Pablo Hoffman's (Scrapinghub co-founder) answer to "What is the legality of web scraping?", I found on other site:

First things first: I am not a lawyer and these comments are solely based on my experience working at Scrapinghub, please seek legal assistance accordingly.

Here are a few things to consider when scraping public data from websites (note that the following addresses only US law):

  • As long as they don't crawl at a disruptive rate, scrapers do not breach any contract (in the form of terms of use) or commit a crime (as defined in the Computer Fraud and Abuse Act).
  • Website's user agreement is not enforceable as a browsewrap agreement because companies do not provide sufficient notice of the terms to site visitors.
  • Scrapers accesses website data as a visitor, and by following paths similar to a search engine. This can be done without registering as a user (and explicitly accepting any terms).
  • In Nguyen v. Barnes & Noble, Inc. the courts ruled that simply placing a link to a terms of use at the bottom of webpage is not sufficient to "give rise to constructive notice." In other words, there is nothing on a public page that would imply that merely accessing the information is subject to any contractual terms. Scrapers gives neither explicit nor implicit assent to any agreement, therefore breaches no contract.
  • Social networks, for example, assign the value of becoming a user (based on call-to-action on public page), as the ability to: i) Gain access to full profiles, ii) Identify common friends/connections, iii) Get introduced to others, and iv) Contact members directly. As long as scrapers makes no attempt to perform any of these actions they do not gain "unauthorized access" to their services and thus does not violate CFAA
  • A thorough evaluation of the legal issues involved can be seen here: http://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-for-analytics-purposes
like image 76
Andrés Pérez-Albela H. Avatar answered Sep 19 '22 23:09

Andrés Pérez-Albela H.


There must be robots.txt file in root folder of that site.

There are specified paths, that are forbidden to harass with scrappers, and those, which is allowed (with acceptable timeouts specified).

If that file doesn't exists - anything is allowed, and you take no responsibility for website owners fail to provide that info.


Also, here you can find some explanation about robots exclusion standard.

like image 31
ankhzet Avatar answered Sep 19 '22 23:09

ankhzet