I'm failing to read more than 65536
bytes into a buffer from a file using boost::asio::windows::stream_handle
asynchronously.
Starting from 65537
th byte the buffer contains the the data from the very beginning of the file, rather than the expected data.
Here is a code example, which reproduces the issue:
auto handle = ::CreateFile(L"BigFile.xml", GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, nullptr);
boost::asio::io_service ios;
boost::asio::windows::stream_handle streamHandle(ios, handle);
const auto to_read_bytes = 100000;
char buffer[to_read_bytes];
boost::asio::async_read(streamHandle, boost::asio::buffer(buffer, to_read_bytes), [](auto &ec, auto read) {
std::cout << "Bytes read: " << read << std::endl;
});
ios.run();
auto bufferBegin = std::string(buffer, 38);
auto bufferCorrupted = std::string(buffer + 65536, 38); // <- it contains bytes from the beginning of the file
std::cout << "offset 0: " << bufferBegin << std::endl;
std::cout << "offset 65536: " << bufferCorrupted << std::endl;
::CloseHandle(handle);
That code produces an output:
> Bytes read: 100000
> offset 0: <?xml version="1.0" encoding="UTF-8"?>
> offset 65536: <?xml version="1.0" encoding="UTF-8"?>
The source file is bigger than 65536.
This is reproducible with boost 1.61 + VS2015. Also that issue was in boost 1.55 + VS2010.
Operating systems are: Windows 7 and Windows Server 2008R2.
My questions are:
1. Is that the known limitation in boost::asio
or in WinAPI
?
2. If it is the known limitation, what would be the safe size of the buffer to read data? Is it safe to have a buffer of size 65536, or it should be smaller?
As Tanner Sansbury says, you opened a file with FILE_FLAG_OVERLAPPED
but you're trying to use it as a stream. It is not.
async_read()
is basically this loop in asio/impl/read.hpp
:
for (;;)
{
stream_.async_read_some(buffers_, ASIO_MOVE_CAST(read_op)(*this));
buffers_.consume(bytes_transferred);
total_transferred_ += bytes_transferred;
if (!ec && bytes_transferred == 0)
break;
}
The actual maximum number of bytes that will be read in one call comes from completion_condition.hpp
:
enum default_max_transfer_size_t { default_max_transfer_size = 65536 };
The problem is the async_read_some()
call above. You'll notice that there's no offset to tell it where to start reading. Because you are using asynchronous reads (also called "overlapped" on Windows), an offset has to be specified for every read.
This is where it ends up, in asio/detail/impl/win_iocp_handle_service.ipp
:
DWORD bytes_transferred = 0;
op->Offset = offset & 0xFFFFFFFF;
op->OffsetHigh = (offset >> 32) & 0xFFFFFFFF;
BOOL ok = ::ReadFile(impl.handle_, buffer.data(),
static_cast<DWORD>(buffer.size()),
&bytes_transferred, op);
op->Offset
and op->OffsetHigh
are always 0. The pointer inside your buffer will advance correctly, but every chunk will be read from the start of the file.
There's an async_read_some_at()
that's available, which you should use instead, as well as windows::random_access_handle
. This will set the Offset
and OffsetHigh
members correctly. You will have to keep track of the number of bytes read yourself.
The documentation for the OVERLAPPED structure says this:
The Offset and OffsetHigh members together represent a 64-bit file position. It is a byte offset from the start of the file or file-like device, and it is specified by the user; the system will not modify these values. The calling process must set this member before passing the OVERLAPPED structure to functions that use an offset, such as the ReadFile or WriteFile (and related) functions.
There's also this part in Synchronous and Asynchronous I/O:
The system does not maintain the file pointer on asynchronous handles to files and devices that support file pointers (that is, seeking devices), therefore the file position must be passed to the read and write functions in the related offset data members of the OVERLAPPED structure. For more information, see WriteFile and ReadFile.
This is neither a limitation of Asio, Windows, nor buffer sizes. Rather, Asio is performing exactly what it has been told to do within its specifications: it is reading 100000
bytes from a regular file as-if it was a stream. With windows::stream_handle
:
async_read()
will be composed of zero or more intermediate async_read_some()
operations until either the number of bytes requested by the application has been transferred, or until an error occurs
This operation is implemented in terms of zero or more calls to the stream's
async_read_some
function, and is known as a composed operation.
async_read_some()
operations may read less than the number of requested bytes
The read operation may not read all of the requested number of bytes.
each intermediate async_read_some()
operation will read from the start of the stream
As the file handle being used is not truly a stream, but rather a regular file, consider using the windows::random_access_handle
and async_read_at(device, 0, ...)
. The Random-Access HANDLEs documentation notes:
Boost.Asio provides Windows-specific classes that permit asynchronous read and write operations to be performed on HANDLEs that refer to regular files.
When using windows::random_access_handle
and async_read_at()
:
async_read_at()
will be composed of zero or more intermediate async_read_some_at()
operations until either the number of bytes requested by the application has been transferred, or until an error occursasync_read_some_at()
operations may read less than the number of requested bytesasync_read_some_at()
operation will use an offset corresponding to end of the previous read when reading from the device (e.g. the initial offset + current bytes transferred)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With