I have the following algorithm ,
private void writetodb()
{
using(var reader = File.OpenRead("C:\Data.csv");
using(var parser = new TextFieldParser(reader))
{
//Do some opeartions
while(!parser.EndOfData)
{
//Do operations
//Take 500 rows of data and put it in dataset
Thread thread = new thread(() => WriteTodb(tablename, set));
thread.Start();
Thread.Sleep(5000);
}
}
}
public void WriteTodb(string table, CellSet set)
{
//WriteToDB
//Edit: This statement will write to hbase db in hdinsight
hbase.StoreCells(TableName, set);
}
This method works absolutely fine until 500 mb of data but after that it fails saying Out of memory exception
.
I am pretty much sure that it is because of threads but using threads is mandatory and I cant change the architecture.
Can anybody tell me what modifications I have to make in thread programming in the above program to avoid memory exception.
OutOfMemoryError: Java heap space. 1) An easy way to solve OutOfMemoryError in java is to increase the maximum heap size by using JVM options "-Xmx512M", this will immediately solve your OutOfMemoryError.
When data structures or data sets that reside in memory become so large that the common language runtime is unable to allocate enough contiguous memory for them, an OutOfMemoryException exception results.
lang. OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.
First of all, I can't understand your words about threading:
I have to make in thread programming in the above program to avoid memory exception.
You will use the thread programming if you use the TPL
, as it been already suggested. You really don't have to use the Thread
class if you can't understand it. You say that your code is C# 4.0
so the TPL
is an option for you. You can do you work something like this (very easy way):
List<Task> tasks = new List<Task>();
while(!parser.EndOfData)
{
tasks.Add(Task.Run(() => WriteTodb(tablename, set)));
}
Task.WaitAll(tasks.ToArray());
TPL engine will use the default TaskScheduler
class, which uses internal ThreadPool
and can level the resources you have on your server.
Also, I see that you're using the HBase
client from Microsoft, and it has async
method in it:
public async Task StoreCellsAsync(string table, CellSet cells)
{
}
So you can use the asynchronious approach in your code and TPL
at the same time:
List<Task> tasks = new List<Task>();
while(!parser.EndOfData)
{
tasks.Add(WriteTodb(tablename, set)));
}
// asynchroniously await all the writes
await Task.WhenAll(tasks.ToArray());
public async Task WriteTodb(string table,CellSet set)
{
//WriteToDB
//Edit: This statement will write to hbase db in hdinsight asynchroniously!
await hbase.StoreCellsAsync(TableName, set);
}
If, for some strange reasons, you can't use TPL
, you have to refactor your code and write your own thread scheduler:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With