Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle NULLS: C#, Microsoft SDS 1.3 and NetCDF files

Tags:

c#

netcdf

I am writing a C# program that uses Microsoft Scientific Data-Set to read NetCDF files.

using System;
using System.IO;
using sds = Microsoft.Research.Science.Data;
using Microsoft.Research.Science.Data.Imperative;


namespace NetCDFConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            // Gets dataset from file.
            var dataset = sds.DataSet.Open("E:\\Temp\\test.nc?openMode=readOnly");

            // Get the starting DateTime from the meta data.                        
            string dt = (string)dataset.Metadata["START_DATE"];

            //load dataset into array
            Single[,,] dataValues = dataset.GetData<float[,,]>("ACPR"); 

            //Get DateTime from Metadata fields.
            DateTime dt2 = DateTime.ParseExact(dt, "yyyy-MM-dd_HH:mm:ss", null);

            // Latitude grid ranges from = 0 to 215; East Cape is ~ 125-144
            for (int iLatitude = 137; iLatitude < 138; iLatitude++)
            {
                //Longitude ranges from 0 to 165; East Cape is ~ 125-150
                for (int iLongitude = 133; iLongitude < 134; iLongitude++) 
                {
                    //There is normally 85 hours worth of data in a file. But not always... 
                    for (int iTime = 0; iTime < 65; iTime++)
                    {
                        // Get each data point 
                        float? thisValue = dataValues[iTime,iLatitude,iLongitude]; 

                        //Burp it out to the Console. Increment the datetime while im at it. 
                        Console.WriteLine(dt.ToString() + ',' + dt2.ToString() + ',' + iTime.ToString() + ',' + dt2.AddHours(iTime) );
                    }                 
                }
            }

            Console.ReadLine();          

        }           
    }
} 

The files contain predicted rainfall data over a map grid (X,Y). Each grid reference should have 85 hours worth of data.

E:\temp>sds list test.nc
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
[1] Times of type SByte (Time:85) (DateStrLen:19)

But occasionally they might have less (Say 60-70 hours). When that happens my C# programs fails when importing the data.

var dataset = sds.DataSet.Open("test.nc?openMode=readOnly");
Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR");

I can reproduce the error with the command line.

Here I can successfully extract hours 60-65 for Grid XY: 125,130. The last Value i have in this file is Time=69.

E:\temp>sds data test.nc ACPR[60:65,125:125,130:130]
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
                Name = ACPR
         description = ACCUMULATED TOTAL GRID SCALE PRECIPITATION
         MemoryOrder = XY
         coordinates = XLONG XLAT XTIME
             stagger =
           FieldType = 104
               units = mm

[60,125,130]  13.4926
[61,125,130] 15.24556
[62,125,130]  16.3638
[63,125,130] 17.39618
[64,125,130] 20.00507
[65,125,130] 23.57192

If I try and read past hour 69 I get the following error.

E:\temp>sds data test.nc ACPR[60:70,125:125,130:130]
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
                Name = ACPR
         description = ACCUMULATED TOTAL GRID SCALE PRECIPITATION
         MemoryOrder = XY
         coordinates = XLONG XLAT XTIME
             stagger =
           FieldType = 104
               units = mm

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at nc_get_vara_float(Int32 , Int32 , UInt64* , UInt64* , Single* )
   at NetCDFInterop.NetCDF.nc_get_vara_float(Int32 ncid, Int32 varid, IntPtr[] start, IntPtr[] count, Single[] data)
   at Microsoft.Research.Science.Data.NetCDF4.NetCdfVariable`1.ReadData(Int32[] origin, Int32[] shape)
   at sdsutil.Program.PrintData(Variable v, String range, String format)
   at sdsutil.Program.DoData(String uri, String[] args)
   at sdsutil.Program.Main(String[] args)

E:\temp>

If the file contains the full 85 hours I can request Time 0-100 and it still gives me the 85 values without error.

I am convinced that that issue is NULL/missing data. Is there some way I can specify when importing the data where the variable is not null? or use some of sort try/catch?

Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR")>> where it's not blank thanks. ;

Edit: I am beginning to suspect that the file isn't formed correctly. Using the SDS viewer The meta data for a good file vs a bad look like this;

Good file

Bad file

Yet the command line shows the meta data as being the same for both.

E:\temp>sds good.nc
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
[1] Times of type SByte (Time:85) (DateStrLen:19)

E:\temp>sds bad.nc
[2] ACPR of type Single (Time:85) (south_north:213) (west_east:165)
[1] Times of type SByte (Time:85) (DateStrLen:19)

E:\temp>
like image 899
Sir Swears-a-lot Avatar asked Dec 18 '17 02:12

Sir Swears-a-lot


2 Answers

Peter,

Since the error is in the ReadData(Int32[] origin, Int32[] shape) (You pointed out the same); I see two possible solutions:

Before delving into the solution you need to decide if missing data can be treated as 0.0 or does it need to be treated as missing. If missing is different than 0.0 then potentially missing can be encoded as -1.0 if null is unacceptable. Proposing a -1.0 value, for missing data, is assuming that a negative rainfall value is impossible.

If the result, dataValues, contains nulls potentially all you need to do is replace the float with float? in the line:

float thisValue = dataValues[iTime,iLatitude,iLongitude];

to be:

float? thisValue = dataValues[iTime,iLatitude,iLongitude]; 

And if you are home free with float? then this was a happy solution. (You still need to decide how to handle null values.)

Otherwise possible solution 1)

After the call to the Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR"); make sure that the last index size of the array, dataValues, is 85. Potentially GetData(..) does not populate all 85 fields, especially if first row data contains less than 85 fields. Then, if need be, manually replaced the nulls with 0's or -1.0's.

Then when you retrieve the data, you handle nulls, 0's or -1.0 appropriately:

float? thisValue = dataValues[iTime,iLatitude,iLongitude];
// determine what to do with a null/0.0/-1.0 as a thisValue[..] value, 
// .. potentially continue with the next iteration

Possible solution 2)

If you own the GetData(..) method in Single[,,] dataValues = dataset.GetData<Single[,,]>("ACPR"); then you ensure that it, GetData(..), does the work of providing all 85 values and missing values are given as nulls / 0's / -1.0's. Then when you retrieve the data, you handle nulls, 0's or -1.0 appropriately.

Cheers,

Avi

like image 102
AviFarah Avatar answered Nov 16 '22 14:11

AviFarah


I recommend you try this since you don't know the data type it's trying to return:

Object[,,] dataValues = dataset.GetData<object[,,]>("ACPR");

Then you can check if you have a valid float in the loop.

if ( dataValues[iTime,iLatitude,iLongitude] == null )
{
    float floatValue = 0;
    if (Single.TryParse(dataValues[iTime,iLatitude,iLongitude].ToString(), out floatValue)
    {
        Console.WriteLine(dt.ToString() + ',' + dt2.ToString() + ',' + iTime.ToString() + ',' + dt2.AddHours(iTime) );
    }
}
like image 32
Ctznkane525 Avatar answered Nov 16 '22 16:11

Ctznkane525