Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out of memory exception while using threads

I have the following algorithm ,

private void writetodb()
{
    using(var reader = File.OpenRead("C:\Data.csv");
    using(var parser = new TextFieldParser(reader))
    { 
        //Do some opeartions
        while(!parser.EndOfData)
        {
            //Do operations
            //Take 500 rows of data and put it in dataset
            Thread thread = new thread(() => WriteTodb(tablename, set));
            thread.Start();
            Thread.Sleep(5000);
        }
    }
}

public void WriteTodb(string table, CellSet set)
{
    //WriteToDB
    //Edit: This statement will write to hbase db in hdinsight
    hbase.StoreCells(TableName, set);
}

This method works absolutely fine until 500 mb of data but after that it fails saying Out of memory exception.

I am pretty much sure that it is because of threads but using threads is mandatory and I cant change the architecture.
Can anybody tell me what modifications I have to make in thread programming in the above program to avoid memory exception.

like image 799
user1907849 Avatar asked Jul 27 '15 16:07

user1907849


People also ask

How do I resolve out of memory exception?

OutOfMemoryError: Java heap space. 1) An easy way to solve OutOfMemoryError in java is to increase the maximum heap size by using JVM options "-Xmx512M", this will immediately solve your OutOfMemoryError.

What causes system out of memory exception?

When data structures or data sets that reside in memory become so large that the common language runtime is unable to allocate enough contiguous memory for them, an OutOfMemoryException exception results.

Which exception is thrown when there is not enough memory?

lang. OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.


1 Answers

First of all, I can't understand your words about threading:

I have to make in thread programming in the above program to avoid memory exception.

You will use the thread programming if you use the TPL, as it been already suggested. You really don't have to use the Thread class if you can't understand it. You say that your code is C# 4.0 so the TPL is an option for you. You can do you work something like this (very easy way):

List<Task> tasks  = new List<Task>();
while(!parser.EndOfData)
{
    tasks.Add(Task.Run(() => WriteTodb(tablename, set)));
}
Task.WaitAll(tasks.ToArray());

TPL engine will use the default TaskScheduler class, which uses internal ThreadPool and can level the resources you have on your server.

Also, I see that you're using the HBase client from Microsoft, and it has async method in it:

public async Task StoreCellsAsync(string table, CellSet cells)
{
}

So you can use the asynchronious approach in your code and TPL at the same time:

List<Task> tasks  = new List<Task>();
while(!parser.EndOfData)
{
    tasks.Add(WriteTodb(tablename, set)));
}
// asynchroniously await all the writes
await Task.WhenAll(tasks.ToArray());

public async Task WriteTodb(string table,CellSet set)
{
    //WriteToDB
    //Edit: This statement will write to hbase db in hdinsight asynchroniously!
    await hbase.StoreCellsAsync(TableName, set);
}

If, for some strange reasons, you can't use TPL, you have to refactor your code and write your own thread scheduler:

  1. You don't have to create the thread for your write each time, you can reuse them.
  2. Running second time inside the same thread is, in general, faster than create two different threads for each operation.
  3. Split file into some parts, create thread for the writing, and write the data in a loop.
like image 109
VMAtm Avatar answered Sep 21 '22 00:09

VMAtm