The webpage shows that there are 702 Comments.
target youtube sample
I write a function get_total_youtube_comments(url)
,many codes copied from the project on github.
project on github
def get_total_youtube_comments(url):
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--headless")
driver = webdriver.Chrome(options=options,executable_path='/usr/bin/chromedriver')
wait = WebDriverWait(driver,60)
driver.get(url)
SCROLL_PAUSE_TIME = 2
CYCLES = 7
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)
for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)
comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
return all_comments
Try to parse all comments on a sample webpage https://www.youtube.com/watch?v=N0lxfilGfak
.
url='https://www.youtube.com/watch?v=N0lxfilGfak'
list = get_total_youtube_comments(url)
It can get some comments ,only small party of all comments.
len(list)
60
60
is much less than 702
,how to get all comments in youtube with selenium?
@supputuri,i can extract all comments with your code.
comments_list = driver.find_elements_by_xpath("//*[@id='content-text']")
len(comments_list)
709
print(driver.find_element_by_xpath("//h2[@id='count']").text)
717 Comments
comments_list[-1].text
'mistake at 23:11 \nin NOT it should return false if x is true.'
comments_list[0].text
'Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Course curriculum, Visit our Website: Use code "YOUTUBE20" to get Flat 20% off on this training.'
Why the comments number is 709 instead of 717 shown in page?
You are getting a limited number of comments as YouTube will load the comments as you keep scrolling down. There are around 394 comments left on that video you have to first make sure all the comments are loaded and then also expand all View Replies
so that you will reach the max comments count.
Note: I was able to get 700 comments using the below lines of code.
# get the last comment
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
# scroll to the last comment currently loaded
lastEle.location_once_scrolled_into_view
# wait until the comments loading is done
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))
# load all comments
while lastEle != driver.find_element_by_xpath("(//*[@id='content-text'])[last()]"):
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
driver.find_element_by_xpath("(//*[@id='content-text'])[last()]").location_once_scrolled_into_view
time.sleep(2)
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))
# open all replies
for reply in driver.find_elements_by_xpath("//*[@id='replies']//paper-button[@class='style-scope ytd-button-renderer'][contains(.,'View')]"):
reply.location_once_scrolled_into_view
driver.execute_script("arguments[0].click()",reply)
time.sleep(5)
WebDriverWait(driver, 30).until(
EC.invisibility_of_element((By.CSS_SELECTOR, "div.active.style-scope.paper-spinner")))
# print the total number of comments
print(len(driver.find_elements_by_xpath("//*[@id='content-text']")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With