Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - get text from CSS property “content” on a ::before pseudo element in Selenium?

I am trying to scrape an a few elements and return the displayed text on the webpage. I believe I can find the elements fine through css_selectors and xpaths, but i cannot return the desired text. Here is my program below:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
import time
import threading
import pandas as pd

threadLocal = threading.local()

def instantiate_chrome():
    driver = getattr(threadLocal, 'driver', None)

    if driver is None:
        options = webdriver.ChromeOptions()
        options.add_argument('log-level=3')
        options.add_argument('--ignore-certificate-errors')
        options.add_argument('--ignore-ssl-errors')
        driver = webdriver.Chrome(executable_path = r'path/to/chrome', options = options)
        setattr(threadLocal, 'driver', driver)

    return driver

def search_stock(driver, stock):
    search_url = r'https://www.forbes.com/search/?q=' + stock
    driver.get(search_url)
    time.sleep(2)
    driver.find_element_by_xpath(r'/html/body/div[1]/main/div[1]/div[1]/div[4]/div/div[1]/div/div[1]/a[1]').click()

def get_q_score(stock, driver):

    df = pd.DataFrame(columns = ['stock','overall_score','quality', 'momentum','growth','technicals'])
    time.sleep(3)
    overall_score = driver.find_element_by_css_selector(r'.q-factor-total .q-score-bar__grade-label').text
    quality_score = driver.find_element_by_xpath(r'/html/body/div[1]/main/div/div[1]/div[4]/div[2]/div[2]/div[1]/div[2]/div[1]').text

    return print('overall score is '+ overall_score, ' quality score is ' + quality_score)

def main(stock):
    driver = instantiate_chrome()
    print('attempting to get q score for ' + stock)
    search_stock(driver, stock)
    print('found webpage for ' + stock)
    get_q_score(stock, driver)

main('AAPL')

I believe the issue to be that i am attempting to scrape the text via selenium's .text method, but there is no text to scrape. Any thoughts?

like image 879
Ryan Avatar asked Sep 19 '25 01:09

Ryan


1 Answers

You were on the right path except for the text that you mentioned aren't actually text. These texts are actually rendered by a CSS property called the content which can only be used with the pseudo-elements :before and :after. You can read here on how it works if you are interested.

The text are rendered as icons; this is sometimes done by organizations to avoid sensible values being scraped. However, there is a way(somewhat hard) to get around this. Using Selenium and javascript you can individually target the CSS values of the property content in which it holds the values you are after.

Having looked into it for an hour this is simplest pythonic way of getting the values you desire

overall_score = driver.execute_script("return [...document.querySelectorAll('.q-score-bar__grade-label')].map(div => window.getComputedStyle(div,':before').content)") #key line in the problem

The code simply creates a javascript code that targets the classes of the elements and then maps the div elements to the values of the CSS properties. This returns a list

['"TOP BUY"', '"B"', '"B"', '"B"', '"A"']

the values, corresponding in the following order

Q-Factor Score/Quality/Momentum/Growth/Technicals

To access the values of a list you can use a for loop and indexing to select the value. You can see more on that here

like image 115
AzyCrw4282 Avatar answered Sep 20 '25 16:09

AzyCrw4282