I'm implementing a TikTok crawler using selenium and scrapy
start_urls = ['https://www.tiktok.com/trending']
....
def parse(self, response):
options = webdriver.ChromeOptions()
from fake_useragent import UserAgent
ua = UserAgent()
user_agent = ua.random
options.add_argument(f'user-agent={user_agent}')
options.add_argument('window-size=800x841')
driver = webdriver.Chrome(chrome_options=options)
driver.get(response.url)
The crawler open Chrome but it does not load videos. Image loading
The same problem happens also using Firefox No loading page using Firefox
The same problem using a simple script using Selenium
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get("https://www.tiktok.com/trending")
time.sleep(10)
driver.close()
driver = webdriver.Chrome()
driver.get("https://www.tiktok.com/trending")
time.sleep(10)
driver.close()
Selenium doesn't work (gets detected) for sites like Instagram and TikTok.
The answer is YES! Websites can detect the automation using JavaScript experimental technology navigator. webdriver in the navigator interface. If the website is loaded with automation tools like Selenium, the value of navigator.
The best way to get blocked is to use a proxy while running selenium headlessly. Remember Tiktok/Facebook know something is up the minute you go headless, cause they can see fag that. If you are not using a proxy, they generally allow a good flow of requests before slowing and eventually shutting down responses…
A python package selenium-stealth to prevent detection. This programme is trying to make python selenium more stealthy. As of now selenium-stealth only support Selenium Chrome. After using selenium-stealth you can prevent almost all selenium detections.
Did u try to navigate further within the selenium browser window? If an error 404 appears on following sites, I have a solution that worked for me:
I simply changed my User-Agent to "Naverbot" which is "allowed" by the robots.txt file from Tik Tok
(Robots.txt)
After changing that all sites and videos loaded properly.
Other user-agents that are listed under the "allow" segment should work too, if you want to add a rotation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With