So I've been working on scraper that goes on 10k+pages and scrapes data from it.
The issue is that over time, memory consumption raises drastically. So to overcome this - instead of closing driver instance only at the end of scrape - the scraper is updated so that it closes the instance after every page is loaded and data extracted.
But ram memory still gets populated for some reason.
I tried using PhantomJS but it doesn't load data properly for some reason. I also tried with the initial version of the scraper to limit cache in Firefox to 100mb, but that also did not work.
Note: I run tests with both chromedriver and firefox, and unfortunately I can't use libraries such as requests, mechanize, etc... instead of selenium.
Any help is appreciated since I've been trying to figure this out for a week now. Thanks.
The difference between quit() and close() driver. quit() : The quit() method quits the driver, closing every associated window. driver. close() : The close() method closes the currently focused window, quitting the driver if the current window is the only open window.
close() closes only the current window on which Selenium is running automated tests. The WebDriver session, however, remains active. On the other hand, the driver. quit() method closes all browser windows and ends the WebDriver session.
The Selenium WebDriver scripts are very slow because they run through the browser. There are multiple things that can improve the Selenium WebDriver scripts' speed: use fast selectors. use fewer locators.
close() will close only the current chrome window. browser. quit() should close all of the open windows, then exit webdriver.
The only way to force the Python interpreter to release memory to the OS is to terminate the process. Therefore, use multiprocessing
to spawn the selenium Firefox instance; the memory will be freed when the spawned process is terminated:
import multiprocessing as mp
import selenium.webdriver as webdriver
def worker()
driver = webdriver.Firefox()
# do memory-intensive work
# closing and quitting is not what ultimately frees the memory, but it
# is good to close the WebDriver session gracefully anyway.
driver.close()
driver.quit()
if __name__ == '__main__':
p = mp.Process(target=worker)
# run `worker` in a subprocess
p.start()
# make the main process wait for `worker` to end
p.join()
# all memory used by the subprocess will be freed to the OS
See also Why doesn't Python release the memory when I delete a large object?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With