I use a BinaryReader (MemoryStream(MyByteArray)
) to read variable sized records and process them all in memory. This works well as long as my bytestream, which is in the array, is less than about 1.7 GB in size. After that (which is the maximum size of an integer in my 64-bit system) you cannot create a larger bytearray, although I have enough real memory. So my solution has been to read the bytestream and split it into several byte arrays.
Now however, I cannot "read" across the byte array boundaries, and, as my data is in a variable format, I cannot ensure that byte arrays always finish on a whole record.
This must be a common problem for people processing very large datasets and still have the need for speed.
How do I handle this problem?
Reading a large file in memory at once may consume the entire RAM of the computer and may cause it to throw an error. In such cases, it becomes pertinent to divide the data into chunks. These chunks can then be read sequentially and processed. This is achieved by using the chunksize parameter in read_csv .
You can use: Data Parallelism (Task Parallel Library) Write a Simple Parallel.
You can re-use the MemoryStream by Setting the Position to 0 and the Length to 0. By setting the length to 0 you do not clear the existing buffer, it only resets the internal counters.
You needn't call either Close or Dispose . MemoryStream doesn't hold any unmanaged resources, so the only resource to be reclaimed is memory. The memory will be reclaimed during garbage collection with the rest of the MemoryStream object when your code no longer references the MemoryStream .
Edit: Reading up on the basics, I realize that memory-mapped files might be slower than normal I/O for sequential access.
Have you tried something like this:
var stream = new FileStream("data",
FileMode.Open,
FileAccess.Read,
FileShare.Read,
16 * 1024,
FileOptions.SequentialScan)
var reader = new BinaryReader(stream);
If your data resides in a file and you can use .NET 4.0 consider using MemoryMappedFile
.
You can then either use a MemoryMappedViewStream
to get a stream or use a MemoryMappedViewAccessor
to get a BinaryReader
-like interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With