Is web scraping allowed? [closed]

Tags:

web-scraping

I'm working on a project that requires certain statistics from another website, and I've created an HTML scraper that gets this data every 15 minutes, automatically. However, I stopped the bot now, as in their terms of use, they mention they do not allow it.

I really want to respect this, and especially if there's a law prohibiting me from taking this data, but I've been contacting them through email several times without a single answer, so now I've come to the conclusion that I'll simply grab the data, if it is legal.

On certain forums I've read that it IS legal, but I would much rather get a more "precise" answer here on StackOverflow.

And let's say that this is in fact not illegal, would they have any software to spot my bot making several connections every 15 minutes?

Also, when talking about taking their data, we're talking about a single number for each "team", and this number I will transfer in to our own number.

740

asked Sep 06 '15 23:09

Mikkel

2 Answers

I'll quote Pablo Hoffman's (Scrapinghub co-founder) answer to "What is the legality of web scraping?", I found on other site:

First things first: I am not a lawyer and these comments are solely based on my experience working at Scrapinghub, please seek legal assistance accordingly.

Here are a few things to consider when scraping public data from websites (note that the following addresses only US law):

As long as they don't crawl at a disruptive rate, scrapers do not breach any contract (in the form of terms of use) or commit a crime (as defined in the Computer Fraud and Abuse Act).

Website's user agreement is not enforceable as a browsewrap agreement because companies do not provide sufficient notice of the terms to site visitors.

Scrapers accesses website data as a visitor, and by following paths similar to a search engine. This can be done without registering as a user (and explicitly accepting any terms).

In Nguyen v. Barnes & Noble, Inc. the courts ruled that simply placing a link to a terms of use at the bottom of webpage is not sufficient to "give rise to constructive notice." In other words, there is nothing on a public page that would imply that merely accessing the information is subject to any contractual terms. Scrapers gives neither explicit nor implicit assent to any agreement, therefore breaches no contract.

Social networks, for example, assign the value of becoming a user (based on call-to-action on public page), as the ability to: i) Gain access to full profiles, ii) Identify common friends/connections, iii) Get introduced to others, and iv) Contact members directly. As long as scrapers makes no attempt to perform any of these actions they do not gain "unauthorized access" to their services and thus does not violate CFAA

A thorough evaluation of the legal issues involved can be seen here: http://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-for-analytics-purposes

answered Sep 19 '22 23:09

Andrés Pérez-Albela H.

There must be robots.txt file in root folder of that site.

There are specified paths, that are forbidden to harass with scrappers, and those, which is allowed (with acceptable timeouts specified).

If that file doesn't exists - anything is allowed, and you take no responsibility for website owners fail to provide that info.

Also, here you can find some explanation about robots exclusion standard.

answered Sep 19 '22 23:09

ankhzet

Related questions
                            
                                Module request how to properly retrieve accented characters? � � �
                            
                                How to send cookie with scrapy CrawlSpider requests?
                            
                                How to webscrape secured pages in R (https links) (using readHTMLTable from XML package)?
                            
                                How to use Beautiful Soup to extract string in <script> tag?
                            
                                Web scraping - how to access content rendered in JavaScript via Angular.js?
                            
                                asyncio web scraping 101: fetching multiple urls with aiohttp
                            
                                How to connect via HTTPS using Jsoup?
                            
                                Python BeautifulSoup scrape tables
                            
                                How to generate the start_urls dynamically in crawling?
                            
                                "Failed to decode response from marionette" message in Python/Firefox headless scraping script
                            
                                "SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/
                            
                                Managing puppeteer for memory and performance
                            
                                What is the fastest way to scrape HTML webpage in Android?
                            
                                Beautiful Soup Using Regex to Find Tags?
                            
                                Python selenium multiprocessing
                            
                                scrapy: convert html string to HtmlResponse object
                            
                                Scrape tables into dataframe with BeautifulSoup
                            
                                How to fix "mapping values are not allowed in this context " error in yaml file?
                            
                                Puppeteer Execution context was destroyed, most likely because of a navigation
                            
                                I need a Powerful Web Scraper library [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With