Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read from a huge MemoryStream in C#

I use a BinaryReader (MemoryStream(MyByteArray)) to read variable sized records and process them all in memory. This works well as long as my bytestream, which is in the array, is less than about 1.7 GB in size. After that (which is the maximum size of an integer in my 64-bit system) you cannot create a larger bytearray, although I have enough real memory. So my solution has been to read the bytestream and split it into several byte arrays.

Now however, I cannot "read" across the byte array boundaries, and, as my data is in a variable format, I cannot ensure that byte arrays always finish on a whole record.

This must be a common problem for people processing very large datasets and still have the need for speed.

How do I handle this problem?

like image 993
ManInMoon Avatar asked Sep 06 '10 11:09

ManInMoon


People also ask

How do I read large chunk files?

Reading a large file in memory at once may consume the entire RAM of the computer and may cause it to throw an error. In such cases, it becomes pertinent to divide the data into chunks. These chunks can then be read sequentially and processed. This is achieved by using the chunksize parameter in read_csv .

How do you handle a large amount of data in C #?

You can use: Data Parallelism (Task Parallel Library) Write a Simple Parallel.

How do I reuse MemoryStream?

You can re-use the MemoryStream by Setting the Position to 0 and the Length to 0. By setting the length to 0 you do not clear the existing buffer, it only resets the internal counters.

Do I need to close MemoryStream?

You needn't call either Close or Dispose . MemoryStream doesn't hold any unmanaged resources, so the only resource to be reclaimed is memory. The memory will be reclaimed during garbage collection with the rest of the MemoryStream object when your code no longer references the MemoryStream .


1 Answers

Edit: Reading up on the basics, I realize that memory-mapped files might be slower than normal I/O for sequential access.

Have you tried something like this:

var stream = new FileStream("data", 
    FileMode.Open, 
    FileAccess.Read, 
    FileShare.Read, 
    16 * 1024, 
    FileOptions.SequentialScan)

var reader = new BinaryReader(stream);

If your data resides in a file and you can use .NET 4.0 consider using MemoryMappedFile.

You can then either use a MemoryMappedViewStream to get a stream or use a MemoryMappedViewAccessor to get a BinaryReader-like interface.

like image 173
Rasmus Faber Avatar answered Sep 20 '22 03:09

Rasmus Faber