Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selenium not freeing up memory even after calling close/quit

So I've been working on scraper that goes on 10k+pages and scrapes data from it.

The issue is that over time, memory consumption raises drastically. So to overcome this - instead of closing driver instance only at the end of scrape - the scraper is updated so that it closes the instance after every page is loaded and data extracted.

But ram memory still gets populated for some reason.

I tried using PhantomJS but it doesn't load data properly for some reason. I also tried with the initial version of the scraper to limit cache in Firefox to 100mb, but that also did not work.

Note: I run tests with both chromedriver and firefox, and unfortunately I can't use libraries such as requests, mechanize, etc... instead of selenium.

Any help is appreciated since I've been trying to figure this out for a week now. Thanks.

like image 272
ScrapyNoob Avatar asked Jul 02 '16 21:07

ScrapyNoob


People also ask

What is the difference between quit () and close () method in selenium?

The difference between quit() and close() driver. quit() : The quit() method quits the driver, closing every associated window. driver. close() : The close() method closes the currently focused window, quitting the driver if the current window is the only open window.

Can we use close () and quit together in selenium?

close() closes only the current window on which Selenium is running automated tests. The WebDriver session, however, remains active. On the other hand, the driver. quit() method closes all browser windows and ends the WebDriver session.

Why Selenium is too slow?

The Selenium WebDriver scripts are very slow because they run through the browser. There are multiple things that can improve the Selenium WebDriver scripts' speed: use fast selectors. use fewer locators.

How do I close all chrome drivers?

close() will close only the current chrome window. browser. quit() should close all of the open windows, then exit webdriver.


1 Answers

The only way to force the Python interpreter to release memory to the OS is to terminate the process. Therefore, use multiprocessing to spawn the selenium Firefox instance; the memory will be freed when the spawned process is terminated:

import multiprocessing as mp
import selenium.webdriver as webdriver

def worker()
    driver = webdriver.Firefox()
    # do memory-intensive work
    # closing and quitting is not what ultimately frees the memory, but it
    # is good to close the WebDriver session gracefully anyway.
    driver.close()
    driver.quit()

if __name__ == '__main__':
    p = mp.Process(target=worker)
    # run `worker` in a subprocess
    p.start()
    # make the main process wait for `worker` to end
    p.join()
    # all memory used by the subprocess will be freed to the OS

See also Why doesn't Python release the memory when I delete a large object?

like image 81
unutbu Avatar answered Nov 10 '22 12:11

unutbu