Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the "Bytespider" user agent? [closed]

Sample user agent String:

Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.1511.1269 Mobile Safari/537.36; Bytespider

Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.7997.1233 Mobile Safari/537.36; Bytespider

like image 926
Gokula Kannan Avatar asked Sep 12 '19 14:09

Gokula Kannan


2 Answers

We were seeing the same things - a reasonably small set of Android/iOS user agents, all ending with Bytespider, and all ignoring our robots.txt files. One of our platform engineers had the bright idea of a reverse DNS lookup on their cluster.

The result - this appears to be https://bytedance.com/

Given they don't respect the robots.txt file, I'd consider them block-fodder.

like image 53
James Avatar answered Sep 21 '22 13:09

James


I'm seeing this on my website as well. Every second it issues GET requests for nonexistent pages. I resorted to returning 403 HTTP status code when bytespider is in the user agent string and blocking IP addresses in the firewall (adding them periodically based on server logs). The majority of the requests are issued from IP addresses owned by Chinese and Singaporean ISPs as well as Cloudflare.

Sample requests:

172.69.22.98 - - [30/Sep/2019:13:16:10 +0000] "GET /CloudHD/interview-of-riyaz-14-bestfriend-secret-reveals-with-proof-yaari-hai/ZVRmSmlTQlFaRDQ.html HTTP/1.1" 403 571 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.5653.1247 Mobile Safari/537.36; Bytespider"
172.68.142.101 - - [30/Sep/2019:13:18:12 +0000] "GET /CloudHD/hot-desi-girl-big-boob-s-in-blouse-nude-selfie/WmVzSi1SOEtXTjg.html HTTP/1.1" 403 571 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.8372.1186 Mobile Safari/537.36; Bytespider"

As you may guess, no paths even remotely resembling these are available on my website. The bot has never even tried to read /robots.txt, so there's no point in blocking it with this method.

Semrush bot behaved almost identically until I blocked it with /robots.txt. So Bytespider may be what it presents itself as when it is blocked and doesn't want to get bad press.

like image 37
Jakub Alba Avatar answered Sep 21 '22 13:09

Jakub Alba