Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deny bots to download my files

I have a asp.net download page which send a file to client but I want to deny robots download this file because the file is large and as I can see from the records a bot downloads this file about 20 times. This is slowing down the server and causes bandwidth consumption.

I coded this page to count downloads and detect .net framework of the client so I can post a setup file containing .net framework or not.

I need somehow to deny Google and other bots to reach this page.

My download link is like download.aspx?pack=msp

like image 621
HasanG Avatar asked Jun 27 '10 20:06

HasanG


People also ask

Why are bots crawling my site?

If lots of new content is added to your website, the search engine bots could more aggressively crawl your website to index the new content. There could be a problem with your website, and the bots could be triggering this fault causing a resource-intensive operation, such as an infinite loop.

How can I control bots spiders and crawlers?

One option to reduce server load from bots, spiders, and other crawlers is to create a robots. txt file at the root of your website. This tells search engines what content on your site they should and should not index.


4 Answers

Yes, add a robots.txt file to your site. It should contain a list of rules (suggestions really) how spiders should behave.

Check out this article for more info. Also for kicks, this is the robot.txt file used by Google.

like image 80
Martin Wickman Avatar answered Oct 06 '22 05:10

Martin Wickman


You want a robots.txt file. For example:

User-agent: *
Disallow: /download.aspx

This doesn't forcibly block search engines, but most (including Google) will check for a robots.txt file and follow its instructions

like image 25
Michael Mrozek Avatar answered Oct 06 '22 05:10

Michael Mrozek


The correct answer, as noted by the other two people, is to created a robots.txt file to make well-behaved robots not download things.

However, it is important to know that not all robots are well-behaved, and that robots.txt is only advisory. If you have pages which are not publicly linked, do not list them in robots.txt to "protect" them as some particularly badly-behaved robots actually scan the file to see what interesting URLs there may be that they don't already know about.

like image 32
Donnie Avatar answered Oct 06 '22 05:10

Donnie


In lieu of a robots.txt file, where it isn't possible you can decorate your pages with a <meta name="robots" content="noindex"> tag.

  • Again, as Donnie mentioned, this is just a recommendation for bots and there is no requirement to follow it.

  • Implement a CAPTCHA method that provides a login mechanism to allow desirable users to access a protected folder where you keep your biggest files.

  • Instead of providing direct links to content that is easily parsed by bots, use Javascript on your download link to redirect your users. Many bots won't execute javascript, though bot obfuscation is often a moving target.

like image 22
Laramie Avatar answered Oct 06 '22 04:10

Laramie