Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

byte[] and efficiently passing by reference

So this is in relationship to dealing with the Large Object Heap and trying to minimize the number of times I instantiate a byte[]. Basically, I'm having OutOfMemoryExceptions and I feel like it's because we're instantiating too many byte array's. The program works fine when we process a couple of files, but it needs to scale, and it currently can't.

In a nutshell, I've got a loop that pulls documents from a database. Currently, it's pulling one document at a time and then processing the document. Documents can range from less than a meg to 400+ megs. (hence why i'm processing one at a time). The following is pseudo-code and before I've optimized.

So the steps I'm doing are:

  1. Make a call to the database to find the largest file size (and then multiplying it by 1.1)

    var maxDataSize = new BiztalkBinariesData().GetMaxFileSize();
    maxDataSize = (maxDataSize != null && maxDataSize > 0)
        ? (long)(maxDataSize * 1.1)
        : 0;
    var FileToProcess = new byte[maxDataSize];
    
  2. Then I make another database call pulling all of the documents (without data) from the database and place these into an IEnumerable.

    UnprocessedDocuments =
        claimDocumentData.Select(StatusCodes.CurrentStatus.WaitingToBeProcessed);
    foreach (var currentDocument in UnprocessDocuments)
    {
         // all of the following code goes here
    }
    
  3. Then I populate my byte[] array from an external source:

    FileToProcess = new BiztalkBinariesData()
        .Get(currentDocument.SubmissionSetId, currentDocument.FullFileName);
    
  4. Here is the question. It would be much cleaner to pass the currentDocument (IClaimDocument) to other methods to process. So if I set the data part of the currentDocument to the pre-formatted array, will this use the existing reference? Or does this create a new array in the Large Object Heap?

    currentDocument.Data = FileToProcess;
    
  5. At the end of the loop, I would then clear FileToProcess

    Array.Clear(FileToProcess, 0, FileToProcess.length);
    

Was that clear? If not, I'll try to clean it up.

like image 229
Cyfer13 Avatar asked Jan 31 '12 15:01

Cyfer13


People also ask

What does system byte [] mean?

System. Byte[*] is an array that has a non-zero lower bound. For example, an array that starts at 1.

Can you print [] Byte?

You can simply iterate the byte array and print the byte using System. out. println() method.


2 Answers

Step 1:

var FileToProcess = new byte[maxDataSize];

Step 3:

FileToProcess = new BiztalkBinariesData()
    .Get(currentDocument.SubmissionSetId, currentDocument.FullFileName);

Your step 1 is completely unnecessary, since you re-assign the array in step 3 - you are creating a new array, you do not populate the existing array - So essentially step 1 is just creating more work for the GC, which if you do it in quick order (and if it is not optimized away by the compiler, which is entirely possible) might explain some of the memory pressure you are seeing.

like image 156
BrokenGlass Avatar answered Oct 21 '22 04:10

BrokenGlass


Arrays are reference types and as such you will be passing a copy of the reference, not a copy of the array itself. That would only be true with value types.

This simple snippet illustrates how arrays behave as reference types:

public void Test()
{    
    var intArray = new[] {1, 2, 3, 4};
    EditArray(intArray);
    Console.WriteLine(intArray[0].ToString()); //output will be 0
}

public void EditArray(int[] intArray)
{
    intArray[0] = 0;
}
like image 24
InBetween Avatar answered Oct 21 '22 04:10

InBetween