My iOS app has had intermittent SSL errors when making HTTPS requests to the backend for several months.
The error description:
An SSL error has occurred and a secure connection to the server cannot be made.
The console logs when in debug mode:
2019-07-06 15:12:37.012198+0100 MyApp[37255:12499941] [BoringSSL] nw_protocol_boringssl_input_finished(1543) [C2.1:2][0x159e8e4a0] Peer disconnected during the middle of a handshake. Sending errSSLClosedNoNotify(-9816) alert
2019-07-06 15:12:37.026641+0100 MyApp[37255:12499941] TIC TCP Conn Failed [2:0x280486d00]: 3:-9816 Err(-9816)
2019-07-06 15:12:37.027759+0100 MyApp[37255:12499941] NSURLSession/NSURLConnection HTTP load failed (kCFStreamErrorDomainSSL, -9816)
2019-07-06 15:12:37.027839+0100 MyApp[37255:12499941] Task <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1> HTTP load failed (error code: -1200 [3:-9816])
2019-07-06 15:12:37.028016+0100 MyApp[37255:12499941] Task <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1> finished with error - code: -1200
2019-07-06 15:12:37.032759+0100 MyApp[37255:12500041] Task <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1> load failed with error Error Domain=NSURLErrorDomain Code=-1200 "An SSL error has occurred and a secure connection to the server cannot be made." UserInfo={NSErrorFailingURLStringKey=https://api.example.com/v1/example/example?param=example, NSLocalizedRecoverySuggestion=Would you like to connect to the server anyway?, _kCFStreamErrorDomainKey=3, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalDataTask <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1>"
), NSLocalizedDescription=An SSL error has occurred and a secure connection to the server cannot be made., NSErrorFailingURLKey=https://api.example.com/v1/example/example?param=example, NSUnderlyingError=0x283ff2160 {Error Domain=kCFErrorDomainCFNetwork Code=-1200 "(null)" UserInfo={_kCFStreamPropertySSLClientCertificateState=0, _kCFNetworkCFStreamSSLErrorOriginalValue=-9816, _kCFStreamErrorDomainKey=3, _kCFStreamErrorCodeKey=-9816}}, _kCFStreamErrorCodeKey=-9816} [-1200]
The error occurs mainly on 3G/4G, not wifi, and occurs more often when the network signal is low. If it happens once it will keep happening for the next few requests, but will eventually work again shortly thereafter.
Based on the analytics, user reviews, and user bug reports: it is affecting a large percentage of users, but not 100% of them.
-
The backend is hosted on AWS Elastic Beanstalk. Served as a Docker app, using an Nginx proxy server, and multiple instances behind a load balancer.
I've tried increasing and decreasing the instance sizes and it seemed to make no difference.
I recently made an entirely new Elastic Beanstalk environment from scratch, to see if that helped. Previously it was using the Classic Load Balancer, now it is using the Application Load Balancer. Early indications are it has reduced the number of SSL errors, but they are still occurring.
The new load balancer is using this SSL policy:
ELBSecurityPolicy-FS-2018-06
Which is defined here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html
Should it be using a different SSL policy?
-
In the app the web requests were being made using URLSession.shared.dataTask...
etc. And I've also tried using the Alamofire library to see if that made a difference. It did not.
I feel like this may have something to do with Apple's App Transport Security. However, as it fails intermittently I'm at a loss as to how.
The relevant Apple docs are the bottom of this page: https://developer.apple.com/security/
If you need more information to help debug please let me know.
-
UPDATE:
So after trying many of the suggestions (thank you to everyone who contributed!) - and learning a lot more about SSL, load balancers, etc. - I have found something that has fixed the issue.
(Minor caveat: I can't be 100% certain it's completely fixed, due the intermittent nature of the issue and my not so great tracking of it, but all available evidence suggests it is now fixed.)
The "fix" was to move the service to Google Cloud Run, which is basically serverless for Docker containers.
Crucially Google Cloud automatically handles setting up the SSL certificate, so there were zero parts for me to screw up. Another advantage is I'm now only paying for the compute time I'm actually using, so it's cheaper.
Apologies to anyone reading this looking for an actual solution to the original problem, but there are a bunch of good things to investigate in the answers and comments below.
Disclaimer: This is not an answer to your question I'm just trying to think loudly with you
here is the couple of points I'll be checking thinking it might help me identify the root cause of the issue assuming that you have this info or have the option to get them otherwise it will be a black box unless you can co-debug with amazon
it is obvious that this is certificate pinning issue
check through Wireshark through 3g modem the TLS version requests is sent and check the required from AWS for example they might require 1.2 and you are sending 1.1
this is critical to check the certificate string on the server side and compare it with the client side manually it might be encoded differently through the connection pipeline
as long as you said it might fail more often when there is a slow connection check the certificate pinning timeout ( the server might get part of the certificate string and compare it with the one it has and finds mismatch due to connection latency)
make sure all the instances of the docker app behind the load balancer have the exact same version of the certificate you are pinning
check the statistics of the iOS version that their connections has failed and the security checks in this specific version
Did you added App Transport Security Settings keys in your Info.plist file?
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>NSAllowsArbitraryLoads</key>
<true/>
<key>NSAllowsArbitraryLoadsForMedia</key>
<true/>
<key>NSAllowsArbitraryLoadsInWebContent</key>
<true/>
<key>NSExceptionDomains</key>
<dict>
<key>YOUR_SERVER_COM</key>
<dict>
<key>NSExceptionRequiresForwardSecrecy</key>
<false/>
<key>NSIncludesSubdomains</key>
<true/>
</dict>
<key>facebook.com</key>
<dict>
<key>NSExceptionRequiresForwardSecrecy</key>
<false/>
<key>NSIncludesSubdomains</key>
<true/>
</dict>
<key>fbcdn.net</key>
<dict>
<key>NSExceptionRequiresForwardSecrecy</key>
<false/>
<key>NSIncludesSubdomains</key>
<true/>
</dict>
<key>graph.facebook.com</key>
<dict>
<key>NSExceptionRequiresForwardSecrecy</key>
<false/>
<key>NSIncludesSubdomains</key>
<true/>
</dict>
</dict>
</dict>
</plist>
First of all, I've had all the symptoms you described. When searching for solutions, network team, security team, software team, etc. I talked to all the teams. It is a very difficult problem to solve, but it will be useful to briefly explain how we solve it.
Tip1: As you can see, SSL authentication is not always wrong. Sometimes it has throwing errors. SSL key or any file that used in your infrastructure is ultimately a file with bytes, which sometimes causes this error because not all of them can be sent on your network. I figured that in my case and even debugged the situation it was just like that. corrupted file packages caused of this.
Tip2: The general reason why a request can work correctly and sometimes incorrectly for different clients is that the server responds to some requests by cache. This is usually related to the loadballancer configuration. In my case, the cookie-based authentication has changed with other authentication model by a software engineer. This evolves requests through a static object in the ram, causing problems with byte transfer for a better performance.
The point I strongly recommend. On the server side, you should check Loadballancer properties one by one. Review Life Cycle management. You can even change your authentication method by effecting the Loadballancer to session-based or cookie based what if you need exactly.
I dont know much about your backend architecture (docker, nginx). My guess is that your backend originally was written to serve non-mobile browsers, perfectly encrypted content, but was written prior to migrating to AWS and does backend authentication? Now they have asked you to build the IOS app for the front and they "lift and shift" the backend into Elastic Beanstalk? This is a good strategy because its simpler to get going on the cloud and Elastic Beanstalk offers scaling.
The problem with this strategy is that when the original backend encryption traffic gets load balanced, and the encrypted sessions are not configured to float between the scaled backend correctly, it can break the users session and you get errors.
Your hunch to create a new Elastic Beanstalk app and try the application load balancer is a good idea, but I found this in the AWS docs for configuring Load Balancing Elastic Beanstalk that might contradict that:
Unlike a Classic Load Balancer or a Network Load Balancer, an Application Load Balancer can't have transport layer (layer 4) TCP or SSL/TLS listeners. It supports only HTTP and HTTPS listeners. Additionally, it can't use backend authentication to authenticate HTTPS connections between the load balancer and backend instances.
To rule out the load balancing within Elastic Beanstalk, I would create a new Elastic Beanstalk environment with NO load balancing (or a non Elastic Beanstalk AWS compute stack) and see if you still get any of these errors with the clients connecting to this new environment. If there are no errors, then you can confidently tell your team that they need to consider migrating the authentication out of the backend and into AWS services.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With