Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all comments in youtube with selenium?

The webpage shows that there are 702 Comments.
target youtube sample
enter image description here
I write a function get_total_youtube_comments(url) ,many codes copied from the project on github.

project on github

def get_total_youtube_comments(url):
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import time
    options = webdriver.ChromeOptions()
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument("--headless")
    driver = webdriver.Chrome(options=options,executable_path='/usr/bin/chromedriver')
    wait = WebDriverWait(driver,60)
    driver.get(url)
    SCROLL_PAUSE_TIME = 2
    CYCLES = 7
    html = driver.find_element_by_tag_name('html')
    html.send_keys(Keys.PAGE_DOWN)   
    html.send_keys(Keys.PAGE_DOWN)   
    time.sleep(SCROLL_PAUSE_TIME * 3)
    for i in range(CYCLES):
        html.send_keys(Keys.END)
        time.sleep(SCROLL_PAUSE_TIME)
    comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
    all_comments = [elem.text for elem in comment_elems]
    return  all_comments

Try to parse all comments on a sample webpage https://www.youtube.com/watch?v=N0lxfilGfak.

url='https://www.youtube.com/watch?v=N0lxfilGfak'
list = get_total_youtube_comments(url)

It can get some comments ,only small party of all comments.

len(list)
60

60 is much less than 702,how to get all comments in youtube with selenium?
@supputuri,i can extract all comments with your code.

comments_list = driver.find_elements_by_xpath("//*[@id='content-text']")
len(comments_list)
709
print(driver.find_element_by_xpath("//h2[@id='count']").text)
717 Comments
comments_list[-1].text
'mistake at 23:11 \nin NOT it should return false if x is true.'
comments_list[0].text
'Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Course curriculum, Visit our Website:  Use code "YOUTUBE20" to get Flat 20% off on this training.'

Why the comments number is 709 instead of 717 shown in page?

like image 749
showkey Avatar asked Jul 05 '20 13:07

showkey


1 Answers

You are getting a limited number of comments as YouTube will load the comments as you keep scrolling down. There are around 394 comments left on that video you have to first make sure all the comments are loaded and then also expand all View Replies so that you will reach the max comments count.

Note: I was able to get 700 comments using the below lines of code.

# get the last comment
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
# scroll to the last comment currently loaded
lastEle.location_once_scrolled_into_view
# wait until the comments loading is done
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))

# load all comments
while lastEle != driver.find_element_by_xpath("(//*[@id='content-text'])[last()]"):
    lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
    driver.find_element_by_xpath("(//*[@id='content-text'])[last()]").location_once_scrolled_into_view
    time.sleep(2)
    WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))

# open all replies
for reply in driver.find_elements_by_xpath("//*[@id='replies']//paper-button[@class='style-scope ytd-button-renderer'][contains(.,'View')]"):
    reply.location_once_scrolled_into_view
    driver.execute_script("arguments[0].click()",reply)
time.sleep(5)
WebDriverWait(driver, 30).until(
        EC.invisibility_of_element((By.CSS_SELECTOR, "div.active.style-scope.paper-spinner")))
# print the total number of comments
print(len(driver.find_elements_by_xpath("//*[@id='content-text']")))
like image 162
supputuri Avatar answered Sep 22 '22 10:09

supputuri