How to connect via HTTPS using Jsoup?

Tags:

It's working fine over HTTP, but when I try and use an HTTPS source it throws the following exception:

10-12 13:22:11.169: WARN/System.err(332): javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found. 10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:477) 10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:328) 10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpConnection.setupSecureSocket(HttpConnection.java:185) 10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeSslConnection(HttpsURLConnectionImpl.java:433) 10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeConnection(HttpsURLConnectionImpl.java:378) 10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:205) 10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:152) 10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:377) 10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364) 10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)

Here's the relevant code:

try {     doc = Jsoup.connect("https url here").get(); } catch (IOException e) {     Log.e("sys","coudnt get the html");     e.printStackTrace(); }

318

asked Oct 12 '11 17:10

jfisk

2 Answers

If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in your Java key store. This will result in a JKS file which you in turn set as SSL trust store before using Jsoup (or java.net.URLConnection).

You can grab the certificate from your webbrowser's store. Let's assume that you're using Firefox.

Go to the website in question using Firefox, which is in your case https://web2.uconn.edu/driver/old/timepoints.php?stopid=10
Left in the address bar you'll see "uconn.edu" in blue (this indicates a valid SSL certificate)
Click on it for details and then click on the More information button.
In the security dialogue which appears, click the View Certificate button.
In the certificate panel which appears, go to the Details tab.
Click the deepest item of the certificate hierarchy, which is in this case "web2.uconn.edu" and finally click the Export button.

Now you've a web2.uconn.edu.crt file.

Next, open the command prompt and import it in the Java key store using the keytool command (it's part of the JRE):

keytool -import -v -file /path/to/web2.uconn.edu.crt -keystore /path/to/web2.uconn.edu.jks -storepass drowssap

The -file must point to the location of the .crt file which you just downloaded. The -keystore must point to the location of the generated .jks file (which you in turn want to set as SSL trust store). The -storepass is required, you can just enter whatever password you want as long as it's at least 6 characters.

Now, you've a web2.uconn.edu.jks file. You can finally set it as SSL trust store before connecting as follows:

System.setProperty("javax.net.ssl.trustStore", "/path/to/web2.uconn.edu.jks"); Document document = Jsoup.connect("https://web2.uconn.edu/driver/old/timepoints.php?stopid=10").get(); // ...

As a completely different alternative, particularly when you need to deal with multiple sites (i.e. you're creating a world wide web crawler), then you can also instruct Jsoup (basically, java.net.URLConnection) to blindly trust all SSL certificates. See also section "Dealing with untrusted or misconfigured HTTPS sites" at the very bottom of this answer: Using java.net.URLConnection to fire and handle HTTP requests

193

answered Sep 23 '22 08:09

BalusC

In my case, all I needed to do was to add the .validateTLSCertificates(false) in my connection

Document doc  = Jsoup.connect(httpsURLAsString)             .timeout(60000).validateTLSCertificates(false).get();

I also had to increase the read timeout but I think this is irrelevant

answered Sep 20 '22 08:09

johnmerm

Related questions
                            
                                How can I kill a thread? without using stop();
                            
                                Android: Changing Background-Color of the Activity (Main View)
                            
                                how to obtain mouse click coordinates outside my window in Java
                            
                                calling another method from the main method in java [duplicate]
                            
                                Mockito thenReturn returns same instance
                            
                                Need to change lowercase_underscore string to camelCase
                            
                                Split String with .(dot) character java android [duplicate]
                            
                                Attempt to invoke virtual method 'android.view.Window$Callback android.view.Window.getCallback()' on a null object reference
                            
                                Should you use international identifiers in Java/C#?
                            
                                Java Spring resttemplate character encoding
                            
                                Color Logic Algorithm
                            
                                How to find out if the value contained in a string is double or not
                            
                                NIO Performance Improvement compared to traditional IO in Java
                            
                                how to convert .java file to a .class file
                            
                                C for Java Programmer? [duplicate]
                            
                                Compare Enums in SpEL
                            
                                com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field
                            
                                Room Persistence Library run time exception when calling Rooms inMemoryBuilder method
                            
                                Java - Circular shift using bitwise operations
                            
                                First occurrence in a binary search

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to connect via HTTPS using Jsoup?

Tags:

java

android

https

web-scraping

jsoup

jfisk

People also ask

2 Answers

BalusC

johnmerm

Recent Activity

Donate For Us