Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need help troubleshooting a .NET Core 2.1 API in a linux Docker

Tags:

We have a bad situation with an API we are running in a Linux Docker on AWS ECS. The API is running with ASP.NET Core 2.1 now, but we also had the problem on ASP.NET 2.0 (we hoped upgrading to 2.1 would fix it, but it didn't).

The problem: Containers are frequently killed with exit code 139. From what I can gather in my research so far, this means a SIGSEGV fault or segmentation fault. Typically thrown if the application is trying to access a part of the memory that it does not have permission to access.

I would not expect such a thing to happen with managed code, but it might be a library or lower-level function in the framework that triggers this.

We have middleware configured for logging unhandled exceptions in the API, but we do not get any logs when this happens. This means we don't have a lot to go on to troubleshoot this.

I know there is not a lot to go on here, so I am basically looking for ways to get some idea of what the problem might be.

Maybe if I could make a memory dump at the time it crashes? - or somehow get more details from Docker or ECS?

Any advice is greatly appreciated!

UPDATE

One of the site reliability engineers here were able to do some more analysis on this. He has identified two types of segfaults that kill the containers:

ip-10-50-128-175 kernel: [336491.431816] traps: dotnet[14200] general protection ip:7f7e14fc2529 sp:7f7b41ff8080 error:0 in libc-2.24.so[7f7e14f8e000+195000]

ip-10-50-128-219 kernel: [481011.825532] dotnet[31035]: segfault at 0 ip (null) sp 00007f50897f7658 error 14 in dotnet[400000+18000]

I am not sure what this means though but thought I would put it here in case someone gets a hint

UPDATE 2

So, we were not able to determine the root cause of the issue yet, but we mitigated the crashing API by stopping one of our internal services from calling one of the endpoints in large volumes. We basically duplicated the logic in the internal service to test if the crashes stopped, and they did stop. This is not a very satisfactory solution, and it won't really help anyone else experiencing this issue, but at least our API was stable throughout Black Friday and Cyber Monday :)

like image 797
Søren Pedersen Avatar asked Oct 17 '18 14:10

Søren Pedersen


People also ask

Can ASP.NET application be run in Docker container?

By default, Docker runs on port 80 with ASP.NET Core, but you can override that. In the example below, the Kestrel server that will run in the container is being configured to listen on port 5000. The other environment variable is simply specifying our environment, which is development in this case.


1 Answers

What information I can find related to segfaults suggests that, as you stated, something is trying to access memory that it's being denied access to. This appears to be nlog.

Try forcing microsoft-specific logging to a warning instead of an exception and see if that issue continues:

Change MEL-config to this:

"Logging": {   "LogLevel": {     "Default": "Information",     "Microsoft.AspNetCore.Hosting": "Warning",     "Microsoft.AspNetCore.Infrastructure": "Warning",     "Microsoft.AspNetCore.Routing": "Warning",     "Microsoft.AspNetCore.Mvc": "Warning"  } 

Or as a last ditch effort, this:

"Logging": {   "LogLevel": {     "Default": "Information",     "Microsoft.AspNetCore": "Warning"  } 

Neither will hurt anything and are easily reversible if they do not solve your issue.

If successful, it sounds like a bug with the current implementation of .NET Core 2 being used in Mono. I would test it in the latest .NET core release if possible, and if it still exists, it has existed for several versions, and I would file a bug report with Mono to see if they will handle it, or at least point you in the right direction on where to report it.

like image 133
Hashgrammer Avatar answered Sep 26 '22 08:09

Hashgrammer