Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a clean way to break up a DataTable into chunks of a fixed size with Linq?

Update: Here's a similar question


Suppose I have a DataTable with a few thousand DataRows in it.

I'd like to break up the table into chunks of smaller rows for processing.

I thought C#3's improved ability to work with data might help.

This is the skeleton I have so far:

DataTable Table = GetTonsOfData();

// Chunks should be any IEnumerable<Chunk> type
var Chunks = ChunkifyTableIntoSmallerChunksSomehow; // ** help here! **

foreach(var Chunk in Chunks)
{
   // Chunk should be any IEnumerable<DataRow> type
   ProcessChunk(Chunk);
}

Any suggestions on what should replace ChunkifyTableIntoSmallerChunksSomehow?

I'm really interested in how someone would do this with access C#3 tools. If attempting to apply these tools is inappropriate, please explain!


Update 3 (revised chunking as I really want tables, not ienumerables; going with an extension method--thanks Jacob):

Final implementation:

Extension method to handle the chunking:

public static class HarenExtensions
{
    public static IEnumerable<DataTable> Chunkify(this DataTable table, int chunkSize)
    {
        for (int i = 0; i < table.Rows.Count; i += chunkSize)
        {
            DataTable Chunk = table.Clone();

            foreach (DataRow Row in table.Select().Skip(i).Take(chunkSize))
            {
                Chunk.ImportRow(Row);
            }

            yield return Chunk;
        }
    }
}

Example consumer of that extension method, with sample output from an ad hoc test:

class Program
{
    static void Main(string[] args)
    {
        DataTable Table = GetTonsOfData();

        foreach (DataTable Chunk in Table.Chunkify(100))
        {
            Console.WriteLine("{0} - {1}", Chunk.Rows[0][0], Chunk.Rows[Chunk.Rows.Count - 1][0]);
        }

        Console.ReadLine();
    }

    static DataTable GetTonsOfData()
    {
        DataTable Table = new DataTable();
        Table.Columns.Add(new DataColumn());

        for (int i = 0; i < 1000; i++)
        {
            DataRow Row = Table.NewRow();
            Row[0] = i;

            Table.Rows.Add(Row);
        }

        return Table;
    }
}
like image 341
Michael Haren Avatar asked Feb 03 '23 11:02

Michael Haren


1 Answers

This is quite readable and only iterates through the sequence once, perhaps saving you the rather bad performance characteristics of repeated redundant Skip() / Take() calls:

public IEnumerable<IEnumerable<DataRow>> Chunkify(DataTable table, int size)
{
    List<DataRow> chunk = new List<DataRow>(size);

    foreach (var row in table.Rows)
    {
        chunk.Add(row);
        if (chunk.Count == size)
        {
            yield return chunk;
            chunk = new List<DataRow>(size);
        }
    }

    if(chunk.Any()) yield return chunk;
}
like image 162
mqp Avatar answered Feb 06 '23 01:02

mqp