Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmer thought process: determining a maximum number of bytes to read when using ReadFile with the Windows API

Tags:

c++

windows

api

I need to call the ReadFile function of the Windows API:

BOOL WINAPI ReadFile(
  _In_        HANDLE       hFile,
  _Out_       LPVOID       lpBuffer,
  _In_        DWORD        nNumberOfBytesToRead,
  _Out_opt_   LPDWORD      lpNumberOfBytesRead,
  _Inout_opt_ LPOVERLAPPED lpOverlapped
);

The argument I'm interested in is the 3rd one:

nNumberOfBytesToRead [in]

The maximum number of bytes to be read.

I'm not interested so much in the "magic number" to put there but the process a seasoned programmer takes to determine the number to put there, preferably in numbered steps.

Also keep in mind I am writing my program in assembler so I'm more interested in the thought process from that perspective.


  • https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467%28v=vs.85%29.aspx
like image 903
scrooge mcDuck Avatar asked Jan 28 '16 15:01

scrooge mcDuck


2 Answers

This requires plenty of insight into both Windows and your hardware. But, in general, here are some possible directions:

  • Is the write buffered or unbuffered? If unbuffered, then you may not even be able to choose the size, but have to follow strict rules for both the size and the alignment of the buffer.
  • In general, you'd want to let the operating system handle as much of the work as possible, because it knows a lot more about the storage device itself and its various users than you do in userspace. So you might want to fetch the whole thing at once, if possible (see points below).
  • If it turns out that that isn't good enough, you may try to outsmart it by playing around with various sizes, to account for cases where you might be able to use current buffers which the OS, for some reason, wouldn't always make use of for different requests.
  • Otherwise, you might play around with sizes ranging anywhere between the disk sector size and multiples of the page size, as these are most likely to already be cached somewhere, and also to map directly to actual hardware requests.
  • Other than performance, there's the question of how much you can afford to store in your process's memory at any given time.
  • There's also the question of sending large requests which might block other processes from getting the chance to get in there and get some data in between—if the OS doesn't already take care of that somehow.
  • There's also the possibility that by requesting too-large chunks the OS might defer your request till other processes get their humble ones served. On the flip side, if it's to intersecting addresses, it might actually serve yours first in order to then serve the other ones from the cache.

In general, you'd probably want to play around until you get something that works well enough.

like image 53
Yam Marcovic Avatar answered Oct 15 '22 04:10

Yam Marcovic


That paremeter is there only to protect you from buffer overflow, so you of course must enter size of the buffer you allocated for this purpose. Other than that you should only read as many bytes as you are interested in this exact time. Modern OS will always use pagecache and any following access to the file will be as fast as accessing RAM. You can also force the OS to cache the file beforehand if you need it whole.
Edit: My experience is against what Yam Marcovic and others recommend. Caching files and chunking reads to ideal sizes is exactly the thing OS is there to do. Do not presume to outsmart it and read just what you need.

like image 45
user1316208 Avatar answered Oct 15 '22 03:10

user1316208