Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a large set of data into C# objects

Tags:

c#

I'm making a complex app (planning which involves: articles, sales, clients, manufacturing, machines...) that uses the information provided by the SQL Server database of an ERP.

I use about 30 different related objects, each of which has its info stored in a table/view. Some of this tables have from 20k to 100k records.

I need to convert all these tables into C# object for future processing that cannot be handled in SQL. I do not need all the rows, but there ins't a way to determine which ones will I need exactly, as this will depend on runtime events.

The question is about the best way to do this. I have tried the following approaches:

  1. Retrieve all data and store it in a DataSet using a SqlDataAdapter, which ocuppies about 300mb in RAM. First problem here: sync, but it's admissable since data isn't going to change that much during execution.

    Then I ran through every row and convert it to C# objects, stored in static Dictionaries for fast access through key. Problem with this is that creating so many objects (millions) takes the memory usage up to 1,4GB, which is too much. Besides from memory, data access is very fast.

So if getting all takes too much memory, I thought that I needed some kind of laxy loading, so I tried:

  1. Another option I have considered is to query directly the database through a SqlDataReader filtering by the item I need only the first time it's required, then it's stored in the static dictionary. This way memory usage it's the minimum, but this way is slow (minutes order) as it means that I need to make like a millon different queries which the server doesn't seem to like (low performance).

Lastly, I tried an intermediate approach that kind of works, but I'm not sure if it's optimal, I suspect it's not:

  1. A third option would be to fill a DataSet containing all the info and store a local static copy, but not convert all the rows to objects, just do it on demand (lazy), something like this:

    public class ProductoTerminado : Articulo     {
    
      private static Dictionary<string, ProductoTerminado> productosTerminados = new Dictionary<string, ProductoTerminado>();
    
      public PinturaTipo pinturaTipo { get; set; }
      public ProductoTerminado(string id)
          : base(id) { }
      public static ProductoTerminado Obtener(string idArticulo)
      {
          idArticulo = idArticulo.ToUpper();
          if (productosTerminados.ContainsKey(idArticulo))
          {
            return productosTerminados[idArticulo];
          }
          else
          {
              ProductoTerminado productoTerminado = new ProductoTerminado(idArticulo);
              //This is where I get new data from that static dataset
              var fila = Datos.bd.Tables["articulos"].Select("IdArticulo = '" + idArticulo + "'").First();
              //Then I fill the object and add it to the dictionary.
              productoTerminado.descripcion = fila["Descripcion"].ToString();
              productoTerminado.paletizacion = Convert.ToInt32(fila["CantidadBulto"]);
              productoTerminado.pinturaTipo = PinturaTipo.Obtener(fila["PT"].ToString());
              productosTerminados.Add(idArticulo, productoTerminado);
              return productoTerminado;
          }
       }
    }
    

So, is this a good way to proceed or should I look into Entity Framework or something like a strongly typed DataSet?

like image 602
Pinx0 Avatar asked Nov 09 '22 14:11

Pinx0


1 Answers

I use relations between about 30 different objects, each of which has its info stored in a table/view. Some of this tables have from 20k to 100k records.

I suggest making a different decision for different types of objects. Usually, the tables that have thousands of records are more likely to change. Tables that have fewer records are less likely. In a project I was working on the decision was to cache in a List<T> the objects that don't change (on start-up). For a few hundred instances this should take well less than a second.

If you are using linq-to-sql, have an object local in a List<T> and have correctly defined the FK constraints, you can do obj.Items to access the Items table filtered by obj's ID. (In this example obj is the PK and Items is the FK table).

This design will also give users the performance they expect. When working on small sets everything is instantaneous (cached). When working on larger sets but making small selects or inserts - performance is good (quick queries that use the PK). You only really suffer when you start doing queries that join multiple big tables; and in those cases, users will probably expect this (though I can't be certain without knowing more about the use case).

like image 117
ytoledano Avatar answered Nov 15 '22 07:11

ytoledano