Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Windows Azure Worker Role not getting past first line of code

I have a worker role that works perfectly in development but doesn't work when deployed. "Doesn't work" is rather vague, but that's really all I have to go on as I'm not seeing any errors or anything (in the event log anyway - maybe there is somewhere else I can look). I have added some trace statements to my code and I'm seeing the first one come out, but none of the others.

WorkerRole Code:

public class WorkerRole : RoleEntryPoint
{
    #region Member variables

    private IWindsorContainer _container;

    private IJob[] _jobs;

    #endregion

    #region Methods

    public override bool OnStart()
    {
        ConfigureDiagnostics();

        Trace.WriteLine("WorkerRole.OnStart()");

        try
        {
            Initialize();

            Trace.WriteLine("Resolving jobs...");
            _jobs = _container.ResolveAll<IJob>();

            StartJobs();

            return base.OnStart();
        }
        catch (Exception ex)
        {
            TraceUtil.TraceException(ex);
            throw;
        }
        finally
        {
            Trace.WriteLine("WorkerRole.OnStart - Complete");
            Trace.Flush();
        }
    }

    /// <summary>
    /// Sets up diagnostics.
    /// </summary>
    private void ConfigureDiagnostics()
    {
        DiagnosticMonitorConfiguration dmc =
            DiagnosticMonitor.GetDefaultInitialConfiguration();

        dmc.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);
        dmc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;

        DiagnosticMonitor.Start(Constants.DiagnosticsConnectionString, dmc);
    }

    /// <summary>
    /// Sets up the IoC container etc.
    /// </summary>
    private void Initialize()
    {
        Trace.WriteLine("WorkerRole.Initialize()");

        try
        {
            Trace.WriteLine("Configuring AutoMapper...");
            AutoMapperConfiguration.Configure();

            Trace.WriteLine("Configuring Windsor...");
            _container = new WindsorContainer();

            Trace.WriteLine(string.Format("Installing assemblies from directory...{0}", 
                Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot)));

            _container.Install(FromAssembly.InDirectory(
                new AssemblyFilter(Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot))));

            Trace.WriteLine(string.Format("Setting the default connection limit..."));
            ServicePointManager.DefaultConnectionLimit = 12;
        }
        finally
        {
            Trace.WriteLine("WorkerRole.Initialize - Complete");
        }
    }

    /// <summary>
    /// Starts all of the jobs.
    /// </summary>
    private void StartJobs()
    {
        Trace.WriteLine("WorkerRole.StartJobs()");

        try
        {
            foreach (IJob job in _jobs)
            {
                job.Start();
            }
        }
        finally
        {
            Trace.WriteLine("WorkerRole.StartJobs - Complete");
        }
    }

    public override void OnStop()
    {
        Trace.WriteLine("WorkerRole.OnStop()");

        try
        {
            foreach (IJob job in _jobs)
            {
                job.Stop();
            }
            _container.Dispose();
        }
        finally
        {
            Trace.WriteLine("WorkerRole.OnStop - Complete");
        }
    }

    #endregion

    #region Private util classes

    public static class AutoMapperConfiguration
    {
        public static void Configure()
        {
            Mapper.Initialize(x => x.AddProfile<ModelProfile>());
        }
    }

    #endregion
}

TraceUtil Code:

public static class TraceUtil
{
    public static void TraceException(Exception ex)
    {
        StringBuilder buffer = new StringBuilder();

        while (ex != null)
        {
            buffer.AppendFormat("{0} : ", ex.GetType());
            buffer.AppendLine(ex.Message);
            buffer.AppendLine(ex.StackTrace);

            ex = ex.InnerException;
        }
        Trace.TraceError(buffer.ToString());
    }
}

Config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  ...
  <system.diagnostics>
    <trace autoflush="true">
      <listeners>
        <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"
             name="AzureDiagnostics">
          <filter type="" />
        </add>
      </listeners>
    </trace>
  </system.diagnostics>
</configuration>

Once the worker has started, if I look in the WADLogsTable, all I see is "WorkerRole.OnStart()" and nothing else!

Any ideas on what the problem could be or how to troubleshoot this would be appreciated.

Update: If I stop the role, I don't see any of the debug statements from the OnStop() method either.

Update: I must have something configured incorrectly with my diagnostics. I thought I was seeing my debug come out correctly when debugging locally, but it turns out I'm not. I see everything in the output window, but I don't see everything in the storage table. I am seeing the following entries in development:

WorkerRole.OnStart()
WorkerRole.Initialize()
Configuring AutoMapper...

I realize that the trace output is only periodically uploaded, but I've waited 5 minutes or so, so I think this should be long enough, since I have it set to 1 minute.

Update: As suggested by @kwill in the comments section I have tried adding a file trace listener as follows:

  <system.diagnostics>
    <trace autoflush="true">
      <listeners>
        <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"
             name="AzureDiagnostics">
        </add>
        <add name="File" type="System.Diagnostics.TextWriterTraceListener" initializeData="C:\TextWriterOutput.log" />
      </listeners>
    </trace>
  </system.diagnostics>

This works fine in my development environment and seems more reliable as well as I get all of the debug out that I would expect. When I deploy it to staging however, the TextWriterOutput.log file isn't even created!

I really need a reliable way of getting debug out of my worker role so that I can troubleshoot the ultimate problem, which is that my jobs aren't working - at this point I still have no idea what they are even trying to do as I can't get any debug out!

Update: I am pretty sure that the missing dll idea suggested by most people is not the problem. To hopefully prove this, I have overridden the run method as shown below and I am seeing the "Heartbeat..." debug come out. It seems to me that either the diagnostics functionality or at least the way I have it configured is unreliable, which is preventing me from investigating why on earth my jobs aren't running.

    public override void Run()
    {
        Trace.WriteLine("LearningMiles.JobProcessor.WorkerRole.Run()", "Information");

        try
        {
            while (true)
            {
                Thread.Sleep(10000);
                Trace.WriteLine("Heartbeat...", "Verbose");
            }
        }
        catch (Exception ex)
        {
            TraceUtil.TraceException(ex);
            throw;
        }
        finally
        {
            Trace.WriteLine("LearningMiles.JobProcessor.WorkerRole.Run() - Complete", "Information");
        }
    }

Update: I have now cross posted this problem on the Windows Azure MSDN forum.

Update: As suggested in the comments, I have now tried removing all 'useful' code. In development this resulted in all of the debug being output. I then tried just removing the call to AutomapperConfiguration.Configure() since previously I was seeing nothing come out after that call. This resulted in some of the trace statements not coming out again. Importantly however, I was seeing the trace statements that I have put in the "jobs". Since it's the jobs not running that I ultimately want to resolve, I deployed that version of the code to staging, but there I just see the OnStart() trace and the "heartbeat" trace. I don't think this really helps, but maybe it will give someone some ideas.

like image 758
s1mm0t Avatar asked Feb 23 '12 09:02

s1mm0t


3 Answers

Given the OnStart() trace is called, but not the Initialize(), my guess is that one of the assemblies referenced by the code in Initialize() is not being copied to the deployment. Remember that .Net JIT-compiles one method at a time, and because of that behavior it would make sense that the OnStart trace message shows up (as there's little other than the Windows Azure and standard .Net framework assemblies referenced up to that point). However, when the CLR goes to JIT the Initialize method, it then tries to load several third-party assemblies (AutoMapper and Windsor) that may not be packaged correctly, but could be GACced or otherwise available locally when the emulator runs.

A couple of things to try:

  1. Manually "Package" your deployment from Visual Studio and take a careful look at the build output. Many times, VS will catch your missing assemblies and tell you (unfortunately as a warning, not an error) that you're missing something.
  2. If you don't see anything in the output that looks obvious, take a look at the cspkg file itself (remember it's just a ZIP file with more ZIP files in it) and make sure any referenced assemblies your application/role needs are in there. Alternately, connect to the VM and check the approot for those assemblies.
  3. You may be able to find an entry in the VM's event log that shows that your application failed to load an assembly.
like image 120
Doug Rohrer Avatar answered Oct 18 '22 22:10

Doug Rohrer


More often than not, the root cause of problems like this is missing dependencies. There are already good suggestions on this front in previous answers.

The trace logs are transferred to Azure storage once per minute as per your configuration. If your worker process crashes, you might lose some of the last trace messages. To work around this, try adding a Thread.Sleep(TimeSpan.FromMinutes(2)) in your exception handlers to ensure that the exception log gets flushed to storage.

Finally, if everything else fails, I suggest you try debugging your role with WinDbg. Enable Remote Desktop for your role. Log in to the role and switch off IE safe browsing so that you can install stuff. Then download and install Debugging Tools for Windows from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=8279 . This package contains the entire Windows SDK, but you can select to install the Debugging Tools for Windows only.

Then run WinDbg and attach to WaWorkerHost.exe. In WinDbg, do

.loadby sos clr   // load the SOS extension that allows you to do managed debugging
sxe clr           // break on CLR exceptions
g                 // continue

WinDbg should now break execution when there is an CLR exception. When it breaks, execute

!PrintException

to see exception details. You may want to add another Thread.Sleep call in your role startup to give you time to attach the debugger before the process exits.

like image 36
Sami Avatar answered Oct 18 '22 20:10

Sami


Thanks to an answer on the MSDN forum, I have been able to troubleshoot and resolve my problem.

The reason why my jobs were not being executed was due to the line below:

_container.Install(FromAssembly.InDirectory(
                    new AssemblyFilter(Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot))));

The role root on staging is E:. Path.Combine() has an obscure implementation which you can read more about in this SO answer. What this meant was that Castle was searching for assemblies in E:approot rather than E:\approot as I expected. I am now constructing the approot path with the method below:

    private string GetAppRoot()
    {
        string root = Environment.GetEnvironmentVariable(Constants.RoleRoot);

        if (root.EndsWith(Path.VolumeSeparatorChar.ToString()))
            root += Path.DirectorySeparatorChar;

        return Path.Combine(root, Constants.AppRoot);
    }

That has resolved my main issue and I am now seeing the jobs execute as expected.

I was only able to troubleshoot this issue by running the worker role in an elevated execution context so that my trace data could be written to a file. I still don't know why, and would like to hear of any ideas why, the trace statements were not being transferred to storage correctly.

like image 1
s1mm0t Avatar answered Oct 18 '22 20:10

s1mm0t