Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data validation with mostly feature in List<T> using .NET Core C#

I have fetched the data as List < T > by reading from different formats e.g. CSV, Parquet, Avro, JSON.

I want to validate the data with mostly feature e.g. The temperature should remain with in range 95% of the time, and rest of the time column value can be null or out of range.

Sample use case expectation:

Expect_Column_Values_To_Be_Between(
    columnName = "temprature",
    minValue   =  60,
    maxValue   =  75,
    mostly     = .95
)

Data Annotation seems to solve it partially (missing mostly feature) as it works on row level not on entire table i.e. object level.

[Range(60, 75, ErrorMessage = "Thermostat value {0} must be between {1} and {2}.")]
public int Temprature;

The Python package reference: https://github.com/great-expectations/.great_expectations contains similar data level validations.

Now trying to seek out guidance how to validate the data (either by any existing equivalent library in .NET or by creating new helper class/extension methods)

like image 428
user3542245 Avatar asked Dec 04 '21 16:12

user3542245


1 Answers

Created a sample extension method which validate the data at table i.e. object level

public class Room
{
    public int RoomId { get; set; }
    public string Name { get; set; }
    public double Temprature { get; set; }
}
List<Room> rooms = new List<Room>();
rooms.Add(new Room() { RoomId = 1, Name = "Hall", Temprature = 65 });
rooms.Add(new Room() { RoomId = 2, Name = "Kitchen", Temprature = 75 });

bool result = rooms.Expect_Column_Values_To_Be_Between("Temprature", 60, 75, .95);
public static class ValidationExtensions
{
    public static bool Expect_Column_Values_To_Be_Between<T>(this List<T> items,
                    string columnName, double minValue, double maxValue, double mostly = 1)
    {
        if (mostly < 0 || mostly > 1)
            throw new ArgumentOutOfRangeException(
                       $"Mostly value {{{mostly}}} can not be less 0 or greater than 1");
        else if (mostly == 0)
            return true;

        if (items == null || items.Count == 0)
            return false;


        int itemsInRangeCount = 0;

        foreach (var item in items)
        {
            PropertyInfo? propertyInfo = item.GetType().GetProperty(columnName);
            if (propertyInfo == null)
                throw new InvalidDataException($"Column not found : {columnName}");

            var itemValue = Convert.ToDouble(propertyInfo.GetValue(item));

            if (itemValue >= minValue && itemValue <= maxValue)
                itemsInRangeCount++;
        }

        return (itemsInRangeCount / items.Count) >= mostly ? true : false;
    }   
}
like image 144
user3542245 Avatar answered Oct 23 '22 14:10

user3542245