Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asynchronous file IO in .Net

I'm building a toy database in C# to learn more about compiler, optimizer, and indexing technology.

I want to maintain maximum parallelism between (at least read) requests for bringing pages into the buffer pool, but I am confused about how best to accomplish this in .NET.

Here are some options and the problems I've come across with each:

  1. Use System.IO.FileStream and the BeginRead method

    But, the position in the file isn't an argument to BeginRead, it is a property of the FileStream (set via the Seek method), so I can only issue one request at a time and have to lock the stream for the duration. (Or do I? The documentation is unclear on what would happen if I held the lock only between the Seek and BeginRead calls but released it before calling EndRead. Does anyone know?) I know how to do this, I'm just not sure it is the best way.

  2. There seems to be another way, centered around the System.Threading.Overlapped structure and P\Invoke to the ReadFileEx function in kernel32.dll.

    Unfortunately, there is a dearth of samples, especially in managed languages. This route (if it can be made to work at all) apparently also involves the ThreadPool.BindHandle method and the IO completion threads in the thread pool. I get the impression that this is the sanctioned way of dealing with this scenario under windows, but I don't understand it and I can't find an entry point to the documentation that is helpful to the uninitiated.

  3. Something else?

  4. In a comment, jacob suggests creating a new FileStream for each read in flight.

  5. Read the whole file into memory.

    This would work if the database was small. The codebase is small, and there are plenty of other inefficiencies, but the database itself isn't. I also want to be sure I am doing all the bookkeeping needed to deal with a large database (which turns out to be a huge part of the complexity: paging, external sorting, ...) and I'm worried it might be too easy to accidentally cheat.

Edit

Clarification of why I'm suspicious with solution 1: holding a single lock all the way from BeginRead to EndRead means I need to block anyone who wants to initiate a read just because another read is in progress. That feels wrong, because the thread initiating the new read might be able (in general) to do some more work before the results become available. (Actually, just writing this has led me to think up a new solution, I put as a new answer.)

like image 831
Doug McClean Avatar asked Sep 18 '08 00:09

Doug McClean


1 Answers

I'm not sure I see why option 1 wouldn't work for you. Keep in mind that you can't have two different threads trying to use the same FileStream at the same time - doing so will definitely cause you problems. BeginRead/EndRead is meant to let your code continue executing while the potentially expensive IO operation takes places, not to enable some sort of multi-threaded access to a file.

So I would suggest that you seek and then do a beginread.

like image 60
John Christensen Avatar answered Oct 11 '22 20:10

John Christensen