Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resolve managed and native stack trace - which API to use?

This is continuation to my previous question - phase 2 so to say.

First question was here: Fast capture stack trace on windows / 64-bit / mixed mode

Now I have resolved a huge amount of stack traces and now wondering how to resolve symbol information of managed stack frames.

For native C++ side it's relatively simple -

First you specify which process from where to take symbols:

HANDLE g_hProcess = GetCurrentProcess();

Where you can replace process in run-time using code snipet like this:

g_hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, g_processId);

b = (g_hProcess != NULL );

if( !b )
    errInfo.AppendFormat(_T("Process id '%08X' is not running anymore."), g_processId );
else
    InitSymbolLoad();

And initialize symbol loading:

void InitSymbolLoad()
{
    SymInitialize(g_hProcess, NULL, TRUE);
    DWORD dwFlags = SymGetOptions();
    SymSetOptions(SymGetOptions() | SYMOPT_DEFERRED_LOADS | SYMOPT_NO_IMAGE_SEARCH);
}

And after that resolve native symbol , somehow like this:

extern HANDLE g_hProcess;

void StackFrame::Resolve()
{
    struct {
        union
        {
            SYMBOL_INFO symbol;
            char buf[sizeof(SYMBOL_INFO) + 1024];
        }u;
    }ImageSymbol = { 0 };

    HANDLE hProcess = g_hProcess;
    DWORD64 offsetFromSymbol = 0;

    ImageSymbol.u.symbol.SizeOfStruct = sizeof(SYMBOL_INFO);
    ImageSymbol.u.symbol.Name[0] = 0;
    ImageSymbol.u.symbol.MaxNameLen = sizeof(ImageSymbol) - sizeof(SYMBOL_INFO);
    SYMBOL_INFO* pSymInfo = &ImageSymbol.u.symbol;

    // Get file / line of source code.
    IMAGEHLP_LINE64 lineStr = { 0 };
    lineStr.SizeOfStruct = sizeof(IMAGEHLP_LINE64);

    function.clear();


    if( SymGetLineFromAddr64(hProcess, (DWORD64)ip, (DWORD*)&offsetFromSymbol, &lineStr) )
    {
        function = lineStr.FileName;
        function += "(";
        function += std::to_string((_ULonglong) lineStr.LineNumber).c_str();
        function += "): ";
    }

    // Successor of SymGetSymFromAddr64.
    if( SymFromAddr(hProcess, (DWORD64)ip, &offsetFromSymbol, pSymInfo) )
        function += ImageSymbol.u.symbol.Name;

}

This looks like working.

But now also managed stack frames.

There are two interfaces which I've located:

  1. IDebugClient / GetNameByOffset

Mentioned in:

  • http://www.codeproject.com/Articles/371137/A-Mixed-Mode-Stackwalk-with-the-IDebugClient-Inter (*) (Includes sample code)
  • http://blog.steveniemitz.com/building-a-mixed-mode-stack-walker-part-1/

Used by:

  • https://github.com/okigan/CrashInsight (Code not touched for 4 years)
  • Mixed mode stackwalk article provides good example.

    1. IXCLRDATAProcess / GetRuntimeNameByAddress
  • Mentioned also in two links above.

  • Used by process hacker (GPL license, C style)

Implementation seems to reside in here:

  • https://github.com/dotnet/coreclr/blob/master/src/debug/daccess/daccess.cpp (Based on commits this code is quite alive)

    1. ICorProfiler / ???

Mentioned at the end of (*) article.

Approach 1 seems to be quite old fashioned, also article (*) mentions some problems around it.

Approach 3 will probably require in-depth analysis of profiling API's. There is also one mention I have found about these API's - in here:

https://naughter.wordpress.com/2015/05/24/changes-in-the-windows-10-sdk-compared-to-windows-8-1-part-two/

· cor.h, cordebug.h/idl, CorError.h, CorHdr.h, corhlpr.h, corprof.h/idl, corpub.h/idl & corsym.h/idl: All of these header files have been removed. They are all the native mode COM interface to .NET.

This sentence I don't fully understand. Are those interfaces dead or replaced or what happened to them ?

So I guess based on my brief analysis approach 2 is only good / alive API interface which is worth of using ? Have you came across any problems related to those api's.

like image 735
TarmoPikaro Avatar asked Jan 11 '16 23:01

TarmoPikaro


People also ask

What is a stack trace and how can I use it to debug my application errors?

In addition to telling you the exact line or function that caused a problem, a stack trace also tracks important metrics that monitor the health of your application. For example, if the average number of stack traces found in your logs increases, you know a bug has likely occurred.

What is the difference between call stack and stack trace?

A call stack is typically "the current stack of operations" - i.e. while it's running. A stack trace is typically a copy of the call stack which is logged at some sort of failure, e.g. an exception.

What is a stack trace and what is it useful for?

A stack trace is a report that provides information about program subroutines. It is commonly used for certain kinds of debugging, where a stack trace can help software engineers figure out where a problem lies or how various subroutines work together during execution.

How to resolve a stacktrace?

Now, in order to resolve the stack trace, StackTrace.JS needs to have access to all the original code and source maps. That means we need to do a bit of work before we can continue.

What do log management and stack traces have to do with each other?

You might wonder what log management and stack traces have to do with each other, and actually, they’re very compatible. It’s best practice for your DevOps team to implement a logging solution. Without an active logging solution, it’s much harder to read and search for stack traces.

Should I use @React-navigation/stack or @ react-navigation/native-stack?

So if you need more customization than what's possible in this navigator, consider using @react-navigation/stack instead - which is a more customizable JavaScript based implementation. To use this navigator, ensure that you have @react-navigation/native and its dependencies (follow this guide), then install @react-navigation/native-stack:

What is native stack navigator?

Native Stack Navigator Native Stack Navigator provides a way for your app to transition between screens where each new screen is placed on top of a stack.


1 Answers

After walking through huge amount of code samples and interfaces, I've understood that there aren't any simple to use API interface. Code and API's developed for native C++ works only with native C++, and code and API's developed for managed code works only with managed code.

There is additionally problem of resolving stack trace afterwards might not work. You see - developer can generate code dynamically on fly using Jit engine / IL Generator, and dispose it as well - so after you have "void*" / instruction address - you should resolve symbolic information right away, not afterwards. But I'll leave this for time being, will assume that developer is not too fancy coder and not generating and disposing new code all the times, and FreeLibrary will not be called without need. (May be I can address this later on if I'll hook FreeLibrary / Jit components.)

Resolving function name was quite trivial, through IXCLRDataProcess with little bit of magic and luck - I was able to get function names, however - I want to expand it deeper - into exact source code path and source code line where code were executing, and this turned to be quite complex functionality to reach.

Finally I've hit upon source code where such thing were performed - and it was done here:

https://github.com/dotnet/coreclr/blob/master/src/ToolBox/SOS/Strike/util.cpp

GetLineByOffset is function name in that file.

I've analyzed, retuned and made my own solution from that source code, which I'm now attaching here now:

Updated code can be found from here: https://sourceforge.net/projects/diagnostic/

But here is just a snapshot of same code taken at some point of time:

ResolveStackM.h:

#pragma once
#include <afx.h>
#pragma warning (disable: 4091)     //dbghelp.h(1544): warning C4091: 'typedef ': ignored on left of '' when no variable is declared
#include <cor.h>                    //xclrdata.h requires this
#include "xclrdata.h"               //IXCLRDataProcess
#include <atlbase.h>                //CComPtr
#include <afxstr.h>                 //CString
#include <crosscomp.h>              //TCONTEXT
#include <Dbgeng.h>                 //IDebugClient
#pragma warning (default: 4091)

class ResoveStackM
{
public:
    ResoveStackM();
    ~ResoveStackM();
    void Close(void);

    bool InitSymbolResolver(HANDLE hProcess, CString& lastError);
    bool GetMethodName(void* ip, CStringA& methodName);
    bool GetManagedFileLineInfo(void* ip, CStringA& lineInfo);

    HMODULE mscordacwks_dll;
    CComPtr<IXCLRDataProcess> clrDataProcess;
    CComPtr<ICLRDataTarget> target;

    CComPtr<IDebugClient>       debugClient;
    CComQIPtr<IDebugControl>    debugControl;
    CComQIPtr<IDebugSymbols>    debugSymbols;
    CComQIPtr<IDebugSymbols3>   debugSymbols3;
};

//
// Typically applications don't need more than one instance of this. If you do, use your own copies.
//
extern ResoveStackM g_managedStackResolver;

ResolveStackM.cpp:

#include "ResolveStackM.h"
#include <Psapi.h>                      //EnumProcessModules
#include <string>                       //to_string
#pragma comment( lib, "dbgeng.lib" )


class CLRDataTarget : public ICLRDataTarget
{
public:
    ULONG refCount;
    bool bIsWow64;
    HANDLE hProcess;

    CLRDataTarget( HANDLE _hProcess, bool _bIsWow64 ) :
        refCount(1), 
        bIsWow64(_bIsWow64),
        hProcess(_hProcess)
    {
    }

    HRESULT STDMETHODCALLTYPE QueryInterface( REFIID riid, PVOID* ppvObject)
    {
        if ( IsEqualIID(riid, IID_IUnknown) || IsEqualIID(riid, __uuidof(ICLRDataTarget)) )
        {
            AddRef();
            *ppvObject = this;
            return S_OK;
        }

        *ppvObject = NULL;
        return E_NOINTERFACE;
    }

    ULONG STDMETHODCALLTYPE AddRef( void)
    {
        return ++refCount;
    }

    ULONG STDMETHODCALLTYPE Release( void)
    {
        refCount--;

        if( refCount == 0 )
            delete this;

        return refCount;
    }

    virtual HRESULT STDMETHODCALLTYPE GetMachineType( ULONG32 *machineType )
    {
        #ifdef _WIN64
            if (!bIsWow64)
                *machineType = IMAGE_FILE_MACHINE_AMD64;
            else
                *machineType = IMAGE_FILE_MACHINE_I386;
        #else
            *machineType = IMAGE_FILE_MACHINE_I386;
        #endif

        return S_OK;
    }

    virtual HRESULT STDMETHODCALLTYPE GetPointerSize( ULONG32* pointerSize )
    {
#ifdef _WIN64
    if (!bIsWow64)
#endif
        *pointerSize = sizeof(PVOID);
#ifdef _WIN64
    else
        *pointerSize = sizeof(ULONG);
#endif
        return S_OK;
    }

    virtual HRESULT STDMETHODCALLTYPE GetImageBase( LPCWSTR imagePath, CLRDATA_ADDRESS *baseAddress )
    {
        HMODULE dlls[1024] = { 0 };
        DWORD nItems = 0;
        wchar_t path[ MAX_PATH ];
        DWORD whatToList = LIST_MODULES_ALL;

        if( bIsWow64 )
            whatToList = LIST_MODULES_32BIT;

        if( !EnumProcessModulesEx( hProcess, dlls, sizeof(dlls), &nItems, whatToList ) )
        {
            DWORD err = GetLastError();
            return HRESULT_FROM_WIN32(err);
        }

        nItems /= sizeof(HMODULE);
        for( unsigned int i = 0; i < nItems; i++ )
        {
            path[0] = 0;
            if( GetModuleFileNameEx(hProcess, dlls[i], path, sizeof(path) / sizeof(path[0])) )
            {
                wchar_t* pDll = wcsrchr( path, L'\\');
                if (pDll) pDll++;

                if (_wcsicmp(imagePath, path) == 0 || _wcsicmp(imagePath, pDll) == 0)
                {
                    *baseAddress = (CLRDATA_ADDRESS) dlls[i];
                    return S_OK;
                }
            }
        }
        return E_FAIL;
    }

    virtual HRESULT STDMETHODCALLTYPE ReadVirtual( CLRDATA_ADDRESS address, BYTE *buffer, ULONG32 bytesRequested, ULONG32 *bytesRead )
    {
        SIZE_T readed;

        if( !ReadProcessMemory(hProcess, (void*)address, buffer, bytesRequested, &readed) )
            return HRESULT_FROM_WIN32( GetLastError() );

        *bytesRead = (ULONG32) readed;
        return S_OK;
    }

    virtual HRESULT STDMETHODCALLTYPE WriteVirtual( CLRDATA_ADDRESS address, BYTE *buffer, ULONG32 bytesRequested, ULONG32 *bytesWritten )
    {
        return E_NOTIMPL;
    }

    virtual HRESULT STDMETHODCALLTYPE GetTLSValue( ULONG32 threadID, ULONG32 index, CLRDATA_ADDRESS *value )
    {
        return E_NOTIMPL;
    }

    virtual HRESULT STDMETHODCALLTYPE SetTLSValue( ULONG32 threadID, ULONG32 index, CLRDATA_ADDRESS value )
    {
        return E_NOTIMPL;
    }

    virtual HRESULT STDMETHODCALLTYPE GetCurrentThreadID( ULONG32 *threadID )
    {
        return E_NOTIMPL;
    }

    virtual HRESULT STDMETHODCALLTYPE GetThreadContext( ULONG32 threadID, ULONG32 contextFlags, ULONG32 contextSize, BYTE *context )
    {
        return E_NOTIMPL;
    }

    virtual HRESULT STDMETHODCALLTYPE SetThreadContext( ULONG32 threadID, ULONG32 contextSize, BYTE *context)
    {
        return E_NOTIMPL;
    }

    virtual HRESULT STDMETHODCALLTYPE Request( ULONG32 reqCode, ULONG32 inBufferSize, BYTE *inBuffer, ULONG32 outBufferSize, BYTE *outBuffer)
    {
        return E_NOTIMPL;
    }
}; //CLRDataTarget





ResoveStackM::ResoveStackM() :
    mscordacwks_dll(0)
{

}

ResoveStackM::~ResoveStackM()
{
    Close();
}

void ResoveStackM::Close( void )
{
    clrDataProcess.Release();
    target.Release();
    debugClient.Release();

    if( mscordacwks_dll != 0 )
    {
        FreeLibrary(mscordacwks_dll);
        mscordacwks_dll = 0;
    }
}

bool ResoveStackM::InitSymbolResolver(HANDLE hProcess, CString& lastError)
{
    wchar_t path[ MAX_PATH ] = { 0 };

    // According to process hacker - mscoree.dll must be loaded before loading mscordacwks.dll.
    // It's enough if base application is managed.

    if( GetWindowsDirectoryW(path, sizeof(path)/sizeof(wchar_t) ) == 0 )
        return false;   //Unlikely to fail.

#ifdef _WIN64
    wcscat(path, L"\\Microsoft.NET\\Framework64\\v4.0.30319\\mscordacwks.dll");
#else
    wcscat(path, L"\\Microsoft.NET\\Framework\\v4.0.30319\\mscordacwks.dll");
#endif

    mscordacwks_dll = LoadLibraryW(path);
    PFN_CLRDataCreateInstance pCLRCreateInstance = 0;

    if( mscordacwks_dll != 0 )
        pCLRCreateInstance = (PFN_CLRDataCreateInstance) GetProcAddress(mscordacwks_dll, "CLRDataCreateInstance");

    if( mscordacwks_dll == 0 || pCLRCreateInstance == 0)
    {
        lastError.Format(L"Required dll mscordacwks.dll from .NET4 installation was not found (%s)", path);
        Close();
        return false;
    }

    BOOL isWow64 = FALSE;
    IsWow64Process(hProcess, &isWow64);
    target.Attach( new CLRDataTarget(hProcess, isWow64 != FALSE) );

    HRESULT hr = pCLRCreateInstance(__uuidof(IXCLRDataProcess), target, (void**)&clrDataProcess );

    if( FAILED(hr) )
    {
        lastError.Format(L"Failed to initialize mscordacwks.dll for symbol resolving (%08X)", hr);
        Close();
        return false;
    }

    hr = DebugCreate(__uuidof(IDebugClient), (void**)&debugClient);
    if (FAILED(hr))
    {
        lastError.Format(_T("Could retrieve symbolic debug information using dbgeng.dll (Error code: 0x%08X)"), hr);
        return false;
    }

    DWORD processId = GetProcessId(hProcess);
    const ULONG64 LOCAL_SERVER = 0;
    int flags = DEBUG_ATTACH_NONINVASIVE | DEBUG_ATTACH_NONINVASIVE_NO_SUSPEND;

    hr = debugClient->AttachProcess(LOCAL_SERVER, processId, flags);
    if (hr != S_OK)
    {
        lastError.Format(_T("Could attach to process 0x%X (Error code: 0x%08X)"), processId, hr);
        Close();
        return false;
    }

    debugControl = debugClient;

    hr = debugControl->SetExecutionStatus(DEBUG_STATUS_GO);
    if ((hr = debugControl->WaitForEvent(DEBUG_WAIT_DEFAULT, INFINITE)) != S_OK)
    {
        return false;
    }

    debugSymbols3 = debugClient;
    debugSymbols  = debugClient;
    // if debugSymbols3 == NULL - GetManagedFileLineInfo will not work
    return true;
} //Init

struct ImageInfo
{
    ULONG64 modBase;
};

// Based on a native offset, passed in the first argument this function
// identifies the corresponding source file name and line number.
bool ResoveStackM::GetManagedFileLineInfo( void* ip, CStringA& lineInfo )
{
    ULONG lineN = 0;
    char path[MAX_PATH];
    ULONG64 dispacement = 0;

    CComPtr<IXCLRDataMethodInstance> method;
    if (!debugSymbols || !debugSymbols3)
        return false;

    // Get managed method by address
    CLRDATA_ENUM methEnum;
    HRESULT hr = clrDataProcess->StartEnumMethodInstancesByAddress((ULONG64)ip, NULL, &methEnum);
    if( hr == S_OK )
    {
        hr = clrDataProcess->EnumMethodInstanceByAddress(&methEnum, &method);
        clrDataProcess->EndEnumMethodInstancesByAddress(methEnum);
    }

    if (!method)
        goto lDefaultFallback;

    ULONG32 ilOffsets = 0;
    hr = method->GetILOffsetsByAddress((CLRDATA_ADDRESS)ip, 1, NULL, &ilOffsets);

    switch( (long)ilOffsets )
    {
        case CLRDATA_IL_OFFSET_NO_MAPPING:
            goto lDefaultFallback;

        case CLRDATA_IL_OFFSET_PROLOG:
            // Treat all of the prologue as part of the first source line.
            ilOffsets = 0;
            break;

        case CLRDATA_IL_OFFSET_EPILOG:
        {
            // Back up until we find the last real IL offset.
            CLRDATA_IL_ADDRESS_MAP mapLocal[16];
            CLRDATA_IL_ADDRESS_MAP* map = mapLocal;
            ULONG32 count = _countof(mapLocal);
            ULONG32 needed = 0;

            for( ; ; )
            {
                hr = method->GetILAddressMap(count, &needed, map);

                if ( needed <= count || map != mapLocal)
                    break;

                map = new CLRDATA_IL_ADDRESS_MAP[ needed ];
            }

            ULONG32 highestOffset = 0;
            for (unsigned i = 0; i < needed; i++)
            {
                long l = (long) map[i].ilOffset;

                if (l == CLRDATA_IL_OFFSET_NO_MAPPING || l == CLRDATA_IL_OFFSET_PROLOG || l == CLRDATA_IL_OFFSET_EPILOG )
                    continue;

                if (map[i].ilOffset > highestOffset )
                    highestOffset = map[i].ilOffset;
            } //for

            if( map != mapLocal )
                delete[] map;

            ilOffsets = highestOffset;
        }
        break;
    } //switch

    mdMethodDef methodToken;
    void* moduleBase = 0;
    {
        CComPtr<IXCLRDataModule> module;

        hr = method->GetTokenAndScope(&methodToken, &module);
        if( !module )
            goto lDefaultFallback;

        //
        // Retrieve ImageInfo associated with the IXCLRDataModule instance passed in. First look for NGENed module, second for IL modules.
        //
        for (int extentType = CLRDATA_MODULE_PREJIT_FILE; extentType >= CLRDATA_MODULE_PE_FILE; extentType--)
        {
            CLRDATA_ENUM enumExtents;
            if (module->StartEnumExtents(&enumExtents) != S_OK )
                continue;

            CLRDATA_MODULE_EXTENT extent;
            while (module->EnumExtent(&enumExtents, &extent) == S_OK)
            {
                if (extentType != extent.type )
                    continue;

                ULONG startIndex = 0;
                ULONG64 modBase = 0;

                hr = debugSymbols->GetModuleByOffset((ULONG64) extent.base, 0, &startIndex, &modBase);
                if( FAILED(hr) )
                    continue;

                moduleBase = (void*)modBase;

                if (moduleBase )
                    break;
            }
            module->EndEnumExtents(enumExtents);

            if( moduleBase != 0 )
                break;
        } //for
    } //module scope

    DEBUG_MODULE_AND_ID id;
    DEBUG_SYMBOL_ENTRY symInfo;
    hr = debugSymbols3->GetSymbolEntryByToken((ULONG64)moduleBase, methodToken, &id);
    if( FAILED(hr) )
        goto lDefaultFallback;

    hr = debugSymbols3->GetSymbolEntryInformation(&id, &symInfo);
    if (FAILED(hr))
        goto lDefaultFallback;

    char* IlOffset = (char*)symInfo.Offset + ilOffsets;

    //
    // Source maps for managed code can end up with special 0xFEEFEE markers that
    // indicate don't-stop points.  Try and filter those out.
    //
    for (ULONG SkipCount = 64; SkipCount > 0; SkipCount--)
    {
        hr = debugSymbols3->GetLineByOffset((ULONG64)IlOffset, &lineN, path, sizeof(path), NULL, &dispacement );
        if( FAILED( hr ) )
            break;

        if (lineN == 0xfeefee)
            IlOffset++;
        else
            goto lCollectInfoAndReturn;
    }

    if( !FAILED(hr) )
        // Fall into the regular translation as a last-ditch effort.
        ip = IlOffset;

lDefaultFallback:
    hr = debugSymbols3->GetLineByOffset((ULONG64) ip, &lineN, path, sizeof(path), NULL, &dispacement);

    if( FAILED(hr) )
        return false;

lCollectInfoAndReturn:
    lineInfo += path;
    lineInfo += "(";
    lineInfo += std::to_string((_ULonglong) lineN).c_str();
    lineInfo += "): ";
    return true;
}


bool ResoveStackM::GetMethodName(void* ip, CStringA& symbol)
{
    symbol.Empty();

    GetManagedFileLineInfo(ip, symbol);

    USES_CONVERSION;
    CLRDATA_ADDRESS displacement = 0;
    ULONG32 len = 0;
    wchar_t name[1024];
    if (!clrDataProcess )
        return false;

    HRESULT hr = clrDataProcess->GetRuntimeNameByAddress( (CLRDATA_ADDRESS)ip, 0, sizeof(name) / sizeof(name[0]), &len, name, &displacement );

    if( FAILED( hr ) )
        return false;

    name[ len ] = 0;
    symbol += W2A(name);
    return true;
} //GetMethodName



ResoveStackM g_managedStackResolver;

So far tested only with some smaller piece of code, only 64-bit (doubt that 32-bit works at all - I don't have call stack determination yet for it).

It's possible that this code contains bugs, but I'll try to haunt them down and fix them.

I harvested so much code that please mark this answer as useful. :-)

like image 130
TarmoPikaro Avatar answered Sep 22 '22 03:09

TarmoPikaro