I have written a piece of software in Java that checks if proxies are working by sending a HTTP request using the proxy.
It takes around 30,000 proxies from a database, then attempts to check if they are operational. The proxies received from the database used to be returned as an ArrayList<String>
, but have been changed to Deque<String>
for reasons stated below.
The way the program works is there is a ProxyRequest
object that stores the IP & Port as a String and int respectively. The ProxyRequest
object has a method isWorkingProxy()
which attempts to send a request using a proxy and returns a boolean
on whether it was successful.
This ProxyRequest
object is wrapped around by a RunnableProxyRequest
object that calls super.isWorkingProxy()
in the overrided run()
method. Based on the response from super.isWorkingProxy()
, the RunnableProxyRequest
object updates a MySQL database.
Do note that the updating of the MySQL database is synchronized()
.
It runs on 750 threads using a FixedThreadPool (on a VPS), but towards the end, it becomes very slow (stuck on ~50 threads), which obviously implies the garbage collector is working. This is the problem.
I have attempted the following to improve the lag, it does not seem to work:
1) Using a Deque<String>
proxies and using Deque.pop()
to obtain the String
in which the proxy is. This (I believe), continuously makes the Deque<String>
smaller, which should improve lag caused by the GC.
2) Set the con.setConnectTimeout(this.timeout);
, where this.timeout = 5000;
This way, the connection should return a result in 5 seconds. If not, the thread is completed and should no longer be active in the threadpool.
Besides this, I don't know any other way I can improve performance.
Can anyone recommend a way for me to improve performance to avoid / stop lagging towards the end of the threads by the GC? I know there is a Stackoverflow question about this (Java threads slow down towards the end of processing), but I have tried everything in the answer and it has not worked for me.
Thank you for your time.
Code snippets:
Loop adding threads to the FixedThreadPool
:
//This code is executed recursively (at the end, main(args) is called again)
//Create the threadpool for requests
//Threads is an argument that is set to 750.
ThreadPoolExecutor executor = (ThreadPoolExecutor)Executors.newFixedThreadPool(threads);
Deque<String> proxies = DB.getProxiesToCheck();
while(proxies.isEmpty() == false) {
try {
String[] split = proxies.pop().split(":");
Runnable[] checks = new Runnable[] {
//HTTP check
new RunnableProxyRequest(split[0], split[1], Proxy.Type.HTTP, false),
//SSL check
new RunnableProxyRequest(split[0], split[1], Proxy.Type.HTTP, true),
//SOCKS check
new RunnableProxyRequest(split[0], split[1], Proxy.Type.SOCKS, false)
//Add more checks to this list as time goes...
};
for(Runnable check : checks) {
executor.submit(check);
}
} catch(IndexOutOfBoundsException e) {
continue;
}
}
ProxyRequest
class:
//Proxy details
private String proxyIp;
private int proxyPort;
private Proxy.Type testingType;
//Request details
private boolean useSsl;
public ProxyRequest(String proxyIp, String proxyPort, Proxy.Type testingType, boolean useSsl) {
this.proxyIp = proxyIp;
try {
this.proxyPort = Integer.parseInt(proxyPort);
} catch(NumberFormatException e) {
this.proxyPort = -1;
}
this.testingType = testingType;
this.useSsl = useSsl;
}
public boolean isWorkingProxy() {
//Case of an invalid proxy
if(proxyPort == -1) {
return false;
}
HttpURLConnection con = null;
//Perform checks on URL
//IF any exception occurs here, the proxy is obviously bad.
try {
URL url = new URL(this.getTestingUrl());
//Create proxy
Proxy p = new Proxy(this.testingType, new InetSocketAddress(this.proxyIp, this.proxyPort));
//No redirect
HttpURLConnection.setFollowRedirects(false);
//Open connection with proxy
con = (HttpURLConnection)url.openConnection(p);
//Set the request method
con.setRequestMethod("GET");
//Set max timeout for a request.
con.setConnectTimeout(this.timeout);
} catch(MalformedURLException e) {
System.out.println("The testing URL is bad. Please fix this.");
return false;
} catch(Exception e) {
return false;
}
try(
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
) {
String inputLine = null; StringBuilder response = new StringBuilder();
while((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
//A valid proxy!
return con.getResponseCode() > 0;
} catch(Exception e) {
return false;
}
}
RunnableProxyRequest
class:
public class RunnableProxyRequest extends ProxyRequest implements Runnable {
public RunnableProxyRequest(String proxyIp, String proxyPort, Proxy.Type testingType, boolean useSsl) {
super(proxyIp, proxyPort, testingType, useSsl);
}
@Override
public void run() {
String test = super.getTest();
if(super.isWorkingProxy()) {
System.out.println("-- Working proxy: " + super.getProxy() + " | Test: " + test);
this.updateDB(true, test);
} else {
System.out.println("-- Not working: " + super.getProxy() + " | Test: " + test);
this.updateDB(false, test);
}
}
private void updateDB(boolean success, String testingType) {
switch(testingType) {
case "SSL":
DB.updateSsl(super.getProxyIp(), super.getProxyPort(), success);
break;
case "HTTP":
DB.updateHttp(super.getProxyIp(), super.getProxyPort(), success);
break;
case "SOCKS":
DB.updateSocks(super.getProxyIp(), super.getProxyPort(), success);
break;
default:
break;
}
}
}
DB
class:
//Locker for async
private static Object locker = new Object();
private static void executeUpdateQuery(String query, String proxy, int port, boolean toSet) {
synchronized(locker) {
//Some prepared statements here.
}
}
Thanks to Peter Lawrey for guiding me to the solution! :)
His comment:
@ILoveKali I have found network libraries are not aggressive enough in shutting down a connection when things go really wrong. Timeouts tend to work best when the connection is fine. YMMV
So I did some research, and found that I had to also use the method setReadTimeout(this.timeout);
. Previously, I was only using setConnectTimeout(this.timeout);
!
Thanks to this post (HttpURLConnection timeout defaults) that explained the following:
Unfortunately, in my experience, it appears using these defaults can lead to an unstable state, depending on what happens with your connection to the server. If you use an HttpURLConnection and don't explicitly set (at least read) timeouts, your connection can get into a permanent stale state. By default. So always set setReadTimeout to "something" or you might orphan connections (and possibly threads depending on how your app runs).
So the final answer is: The GC was doing just fine, it was not responsible for the lag. The threads were simply stuck FOREVER at a single number because I did not set the read timeout, and so the isWorkingProxy()
method never got a result and kept reading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With