Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure instances from 0 to 3 not writing diagnostics data in WadPerformanceCountersTable

I am trying to query data from Azure WadPerformanceCountersTable.

I am trying to get the last 5 minutes of data.

The problem is that I only get data from instances nr. 4,5 and 6, but not from 0,1,2 and 3.

The script I am using to pull de data is this:

Microsoft.WindowsAzure.CloudStorageAccount storageAccount = Microsoft.WindowsAzure.CloudStorageAccount.Parse(AppDefs.CloudStorageAccountConnectionString);
            CloudTableClient cloudTableClient = storageAccount.CreateCloudTableClient();
            TableServiceContext serviceContext = cloudTableClient.GetDataServiceContext();
            IQueryable<PerformanceCountersEntity> traceLogsTable = serviceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable");
            var selection = from row in traceLogsTable
                            where row.PartitionKey.CompareTo("0" + DateTime.UtcNow.AddMinutes(-timespanInMinutes).Ticks) >= 0
                            && row.DeploymentId == deploymentId
                            && row.CounterName == @"\Processor(_Total)\% Processor Time"

                            select row;
            CloudTableQuery<PerformanceCountersEntity> query = selection.AsTableServiceQuery<PerformanceCountersEntity>();
            IEnumerable<PerformanceCountersEntity> result = query.Execute();
            return result;

My diagnostics.wadcfg file is this:

<?xml version="1.0" encoding="utf-8" ?>
<DiagnosticMonitorConfiguration xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration" configurationChangePollInterval="PT1M" overallQuotaInMB="4096">
  <PerformanceCounters bufferQuotaInMB="0" scheduledTransferPeriod="PT5M">
    <PerformanceCounterConfiguration counterSpecifier="\Memory\Available Bytes" sampleRate="PT60S" />
    <PerformanceCounterConfiguration counterSpecifier="\Processor(_Total)\% Processor Time" sampleRate="PT60S" />    
  </PerformanceCounters>
</DiagnosticMonitorConfiguration>

EDIT: Also, I have this code deployed on a test environment in azure, and it works just fine.

EDIT 2: Update to include Service Definitions XML:

<ServiceDefinition name="MyApp.Azure" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" schemaVersion="2012-05.1.7">
  <WebRole name="MyApp.Website" vmsize="ExtraSmall">
    <Sites>
      <Site name="Web">
        <Bindings>
          <Binding name="Endpoint1" endpointName="Endpoint1" />
        </Bindings>
      </Site>
    </Sites>
    <Endpoints>
      <InputEndpoint name="Endpoint1" protocol="http" port="80" />
    </Endpoints>
    <Imports>
      <Import moduleName="Diagnostics" />
    </Imports>
  </WebRole>
  <WorkerRole name="MyApp.Cache" vmsize="ExtraSmall">
    <Imports>
      <Import moduleName="Diagnostics" />
      <Import moduleName="Caching" />
    </Imports>
    <LocalResources>
      <LocalStorage name="Microsoft.WindowsAzure.Plugins.Caching.FileStore" sizeInMB="1000" cleanOnRoleRecycle="false" />
    </LocalResources>
  </WorkerRole>
</ServiceDefinition>

After I have read user @Igorek 's answer I have included my ServiceDefinition.csdef configuration XML. I am still unaware of how I must configure the LocalResources > LocalStorage part of the configuration. The configuration must be set for "MyApp.Website".

EDIT 3: I have made these changes to the test azure account.

I have set this in ServiceDefinitions.csdef

<LocalResources>
    <LocalStorage name="DiagnosticStore" sizeInMB="4096" cleanOnRoleRecycle="false"/>
</LocalResources>    

And I have lowered the OverallQuota and BufferQuota in diagnostics.wadcfg In the end, in the WAD-control-container I have this configuration per instance: http://pastebin.com/aUywLUfE

I will have to put this on the live account to see the results.

FINAL EDIT: Apparently the overall Quota was the problem, even though I cannot guarantee it.

In the end, after a new publish I noticed this:

  • a role instance had the configuration XML in wad-control-container with an overall quota of 1024MB and BufferQuotaInMB of 1024MB --> this was correct,
  • another 2 role instances had an overall quota of 4080MB and BufferQuotaInMB of 500MB --> this was incorrect, they were not writing in WADPerformanceCounters table.
  • both of the XML configuration files(that were in wad-control-container) belonging to each role instance were deleted prior to the new publish.
  • the configuration file diagnostics.wadcfg was configured correctly: 1024MB everywere

So I think there is a problem with their publisher.

Two solutions were tried:

  1. I deleted 1 incorrect XML from 'wad-control-container' and rebooted the machine. The XML was rewritten and the role instance started to write in the WADPerfCountTable.

  2. I used the script below on the other incorrect instance and the incorrect role instance started to write in the WADPerfCountTable.

            var storageAccount = CloudStorageAccount.Parse(AppDefs.CloudStorageAccountConnectionString);
    
            DeploymentDiagnosticManager diagManager = new DeploymentDiagnosticManager(storageAccount, deploymentId);
    
            IEnumerable<RoleInstanceDiagnosticManager> instanceManagers = diagManager.GetRoleInstanceDiagnosticManagersForRole(roleName);
    
            foreach (var roleInstance in instanceManagers)
            {
                DiagnosticMonitorConfiguration currentConfiguration = roleInstance.GetCurrentConfiguration();
                TimeSpan configurationChangePollInterval = TimeSpan.FromSeconds(60);
                if (!IsCurrentConfigurationCorrect(currentConfiguration, overallQuotaInMb, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1)))
                {
                    // Add a performance counter for processor time.
                    PerformanceCounterConfiguration pccCPU = new PerformanceCounterConfiguration();
                    pccCPU.CounterSpecifier = @"\Processor(_Total)\% Processor Time";
                    pccCPU.SampleRate = TimeSpan.FromSeconds(60);
    
                    // Add a performance counter for available memory.
                    PerformanceCounterConfiguration pccMemory = new PerformanceCounterConfiguration();
                    pccMemory.CounterSpecifier = @"\Memory\Available Bytes";
                    pccMemory.SampleRate = TimeSpan.FromSeconds(60);
    
                    currentConfiguration.ConfigurationChangePollInterval = TimeSpan.FromSeconds(60);
                    currentConfiguration.OverallQuotaInMB = overallQuotaInMb;
                    currentConfiguration.PerformanceCounters.BufferQuotaInMB = overallQuotaInMb;
                    currentConfiguration.PerformanceCounters.DataSources.Add(pccCPU);
                    currentConfiguration.PerformanceCounters.DataSources.Add(pccMemory);
                    roleInstance.SetCurrentConfiguration(currentConfiguration);
                }
    
            }
    

Also, I keep receiving this error from time to time The configuration file is missing a diagnostic connection string for one or more roles.

In the end I will choose the current response as the answer, because I have found the problem. Unfortunately, I have not found the cause of the problem. At every publish I risk getting a changed confguration XML.

like image 355
Dragos Durlut Avatar asked Jun 10 '13 09:06

Dragos Durlut


1 Answers

Seeing how your first instances are not transferring data to diagnostics while the later instances do, one possible reason is as follows:

The local diagnostic store on your servers is filled up with diagnostic data and Azure can no longer transfer data out of your local store to storage. Be sure that that space allocated to DiagnosticStore in Role configuration (under Local Storage) is bigger than the amount of buffer quota allocated in diagnostics.wadcfg

Detailed explanation: I've experienced this first-hand with a number of customers, so the following is my own interpretation based on comments from Microsoft support. Azure Diagnostics API does not clean up local storage according to the BufferQuota until that quota is exceeded. DiagnosticStore in cloud project defaults to the same size as the BufferQuota used in all of the examples (4096). What's happening is that your BufferQuota gets awfully close to 4096megs but not equal to the limit and your Diagnostic API does not kick in a purge process. At the same time, your capture of diagnostic data can no longer run properly because local storage is nearly full and Azure host stops ability of the app to write to DiagnosticStore.

Your other servers should stop writing diagnostic data as soon as their local storage fills up as well.

Hope this makes sense.

Editing my reply to precisely point out the changes for anyone reading later:

Simplest approach is to tone down the need for OverallQuotaInMb specified in the diagnostics.wadcfg to be something like 4000 (do make sure that all other buffers combined do not exceed this number)

Alternatively, or additionally, one can manually specify the space allocated to diagnostic store on the VM using LocalStorage setting in the .CSDEF file. This link shows how: http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.diagnostics.diagnosticmonitorconfiguration.overallquotainmb.aspx

like image 153
Igorek Avatar answered Oct 20 '22 16:10

Igorek