In my embedded Selenium/PhantomJSDriver driver it seems resources are not being cleaned up. Running the client synchronously causes millions of open files and eventually throws a "Too many files open" type exception.
Here is some output I gathered from lsof
while the program is running for ~1 minute
$ lsof | awk '{ print $2; }' | uniq -c | sort -rn | head
1221966 12180
34790 29773
31260 12138
20955 8414
17940 10343
16665 32332
9512 27713
7275 19226
5496 7153
5040 14065
$ lsof -p 12180 | awk '{ print $2; }' | uniq -c | sort -rn | head
2859 12180
1 PID
$ lsof -p 12180 -Fn | sort -rn | uniq -c | sort -rn | head
1124 npipe
536 nanon_inode
4 nsocket
3 n/opt/jdk/jdk1.8.0_60/jre/lib/jce.jar
3 n/opt/jdk/jdk1.8.0_60/jre/lib/charsets.jar
3 n/dev/urandom
3 n/dev/random
3 n/dev/pts/20
2 n/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
2 n/usr/share/java/jayatana.jar
I don't understand why using the -p
flag on lsof
has a smaller result set. But it appears most of the entries are pipe
and anon_inode
.
The client is very simple at ~100 lines, and at the end of usage calls driver.close()
and driver.quit()
. I experimented with caching and reusing clients but it did not alleviate the open files
case class HeadlessClient(
country: String,
userAgent: String,
inheritSessionId: Option[Int] = None
) {
protected var numberOfRequests: Int = 0
protected val proxySessionId: Int = inheritSessionId.getOrElse(new Random().nextInt(Integer.MAX_VALUE))
protected val address = InetAddress.getByName("proxy.domain.com")
protected val host = address.getHostAddress
protected val login: String = HeadlessClient.username + proxySessionId
protected val windowSize = new org.openqa.selenium.Dimension(375, 667)
protected val (mobProxy, seleniumProxy) = {
val proxy = new BrowserMobProxyServer()
proxy.setTrustAllServers(true)
proxy.setChainedProxy(new InetSocketAddress(host, HeadlessClient.port))
proxy.chainedProxyAuthorization(login, HeadlessClient.password, AuthType.BASIC)
proxy.addLastHttpFilterFactory(new HttpFiltersSourceAdapter() {
override def filterRequest(originalRequest: HttpRequest): HttpFilters = {
new HttpFiltersAdapter(originalRequest) {
override def proxyToServerRequest(httpObject: HttpObject): io.netty.handler.codec.http.HttpResponse = {
httpObject match {
case req: HttpRequest => req.headers().remove(HttpHeaders.Names.VIA)
case _ =>
}
null
}
}
}
})
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT)
proxy.start(0)
val seleniumProxy = ClientUtil.createSeleniumProxy(proxy)
(proxy, seleniumProxy)
}
protected val driver: PhantomJSDriver = {
val capabilities: DesiredCapabilities = DesiredCapabilities.chrome()
val cliArgsCap = new util.ArrayList[String]
cliArgsCap.add("--webdriver-loglevel=NONE")
cliArgsCap.add("--ignore-ssl-errors=yes")
cliArgsCap.add("--load-images=no")
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy)
capabilities.setCapability("phantomjs.page.customHeaders.Referer", "")
capabilities.setCapability("phantomjs.page.settings.userAgent", userAgent)
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgsCap)
new PhantomJSDriver(capabilities)
}
driver.executePhantomJS(
"""
|var navigation = [];
|
|this.onNavigationRequested = function(url, type, willNavigate, main) {
| navigation.push(url)
| console.log('Trying to navigate to: ' + url);
|}
|
|this.onResourceRequested = function(request, net) {
| console.log("Requesting " + request.url);
| if (! (navigation.indexOf(request.url) > -1)) {
| console.log("Aborting " + request.url)
| net.abort();
| }
|};
""".stripMargin
)
driver.manage().window().setSize(windowSize)
def follow(url: String)(implicit ec: ExecutionContext): List[HarEntry] = {
try{
Await.result(Future{
mobProxy.newHar(url)
driver.get(url)
val entries = mobProxy.getHar.getLog.getEntries.asScala.toList
shutdown()
entries
}, 45.seconds)
} catch {
case e: Exception =>
try {
shutdown()
} catch {
case shutdown: Exception =>
throw new Exception(s"Error ${shutdown.getMessage} cleaning up after Exception: ${e.getMessage}")
}
throw e
}
}
def shutdown() = {
driver.close()
driver.quit()
}
}
I tried several versions of Selenium in case there was a bugfix. The build.sbt:
libraryDependencies += "org.seleniumhq.selenium" % "selenium-java" % "3.0.1"
libraryDependencies += "net.lightbody.bmp" % "browsermob-core" % "2.1.2"
Also, I tried PhantomJS 2.0.1, and 2.1.1:
$ phantomjs --version
2.0.1-development
$ phantomjs --version
2.1.1
Is this a PhantomJS or Selenium problem? Is my client using the API improperly?
The resource usage is caused by BrowserMob. To close the proxy and clean-up its resources, one must call stop()
.
For this client that means modifying the shutdown
method
def shutdown() = {
mobProxy.stop()
driver.close()
driver.quit()
}
Another method, abort
, offers immediate termination of the proxy server and does not wait for traffic to cease.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With