Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iOS app -- no cellular access to our domain on some devices

With a React Native app (only tested those generated with create-react-app), some iPhone users are experiencing an issue where the app can almost never make web requests to our API when connected via cellular data. The domain that is having issues points to an Amazon Elastic Load Balancer (Layer 7, SSL termination), which points to an Nginx reverse proxy (inside EKS Kubernetes cluster). Other APIs (e.g. Mapbox) called by the app work fine over cellular data, including one of ours hosted on a dedicated server. The only requests that don't work are those on our ELB domain. When the user switches to WiFi, our app is able to make web requests to that domain. This has been observed on iPhone 7, iPhone 8, and iPhone X, all running iOS 12.3.1. One device is Verizon and the other 5 reported are AT&T. Every API call is HTTPS. Deleting and reinstalling the app and restarting the device does not resolve the issue. We confirmed in all cases that cellular data was enabled for the app in Settings > Cellular > [App name] and in Settings > [App name] > Use Cellular Data.

The app is built with React Native and web requests are performed with the cross-fetch library.

We were able to get a device that has the issue and run it through Xcode. Here is a subset of the error stack captured in Xcode:

nw_connection_copy_connected_local_endpoint [C12] Connection has no local endpoint
2019-06-27 11:26:16.841347-0400 myapp[23700:1527268] [BoringSSL] 
nw_protocol_boringssl_get_output_frames(1301) [C10.1:2][0x117d5a050] get output frames failed, state 8196
2019-06-27 11:26:22.465855-0400 myapp[23700:1527305] [BoringSSL] nw_protocol_boringssl_error(1584) [C20.1:2][0x119b0e420] Lower protocol stack error: 54
2019-06-27 11:26:22.466665-0400 myapp[23700:1527305] TIC TCP Conn Failed [20:0x280022400]: 1:54 Err(54)
2019-06-27 11:26:23.040101-0400 myapp[23700:1527399] Task <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7> HTTP load failed (error code: -1005 [1:54])
2019-06-27 11:26:23.040408-0400 myapp[23700:1527305] Task <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7> finished with error - code: -1005
load failed with error Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={_kCFStreamErrorCodeKey=54, NSUnderlyingError=0x283a521f0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x28161ab70 [0x1e9e5d420]>{length = 16, capacity = 16, bytes = 0x100201bb3416ca8a0000000000000000}, _kCFStreamErrorCodeKey=54, _kCFStreamErrorDomainKey=1}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <DD5FDD4A-1BE0-41ED-AAC4-9EB07F61F109>.<7>"
), NSLocalizedDescription=The network connection was lost.

Queries to this particular [ELB] -> [Nginx container] -> [Service containers] setup will occasionally work but then stop. It almost indicates a keep-alive situation like this issue. We had the ELB idle timeout set at its default (60s) and we increased it to 300s with no apparent effect. We tried with the keep-alive for Nginx both set to 360s and to 0s (disabled completely).

For the domain in question we have a mix of services hosted in the Kubernetes cluster, such as Java and Node.js. The issue affects all of them equally.

None of the Android app users have reported this issue.

The devices that experience this issue all do so consistently, it is not intermittent.

Due to the type of error, the requests never reach our Nginx logs.

like image 600
jowo Avatar asked May 21 '19 17:05

jowo


Video Answer


1 Answers

Unfortunately, we never found a clear answer to the problem, but we did implement a workaround.

Certain iOS 12.3.1 iPhones on cellular networks seem to have an issue with fact that Amazon's ELB Classic always sends a "Connection: keep-alive" response header. You can change the load balancer's idle timeout, but you cannot set it to 0 (minimum is 1 second). We can reproduce the iOS connection errors by using a new app generated by create-react-app. The requests always work at first and then start to consistently fail.

We fixed the problem by switching from ELB to a Network Load Balancer (AWS NLB). The NLB talks directly to an Nginx ingress controller. Since it's at the TCP level, the NLB layer does not change the headers. The default Nginx controller does not send a "Connection" response header at all. Using this new setup, the iOS app works just fine on all devices.

like image 142
jowo Avatar answered Sep 19 '22 12:09

jowo