Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

boost::asio fails to read more than 65536 bytes from file

I'm failing to read more than 65536 bytes into a buffer from a file using boost::asio::windows::stream_handle asynchronously.

Starting from 65537th byte the buffer contains the the data from the very beginning of the file, rather than the expected data.

Here is a code example, which reproduces the issue:

auto handle = ::CreateFile(L"BigFile.xml", GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, nullptr);
boost::asio::io_service ios;

boost::asio::windows::stream_handle streamHandle(ios, handle);

const auto to_read_bytes = 100000;
char buffer[to_read_bytes];

boost::asio::async_read(streamHandle, boost::asio::buffer(buffer, to_read_bytes), [](auto &ec, auto read) {
    std::cout << "Bytes read: " << read << std::endl;
});

ios.run();

auto bufferBegin = std::string(buffer, 38);
auto bufferCorrupted = std::string(buffer + 65536, 38);   // <- it contains bytes from the beginning of the file

std::cout << "offset 0: " << bufferBegin << std::endl;
std::cout << "offset 65536: " << bufferCorrupted << std::endl;   

::CloseHandle(handle);

That code produces an output:

> Bytes read: 100000  
> offset 0: <?xml version="1.0" encoding="UTF-8"?>  
> offset 65536: <?xml version="1.0" encoding="UTF-8"?>

The source file is bigger than 65536.

This is reproducible with boost 1.61 + VS2015. Also that issue was in boost 1.55 + VS2010.
Operating systems are: Windows 7 and Windows Server 2008R2.

My questions are:
1. Is that the known limitation in boost::asio or in WinAPI?
2. If it is the known limitation, what would be the safe size of the buffer to read data? Is it safe to have a buffer of size 65536, or it should be smaller?

like image 464
Alexander Stepaniuk Avatar asked Dec 24 '22 03:12

Alexander Stepaniuk


2 Answers

As Tanner Sansbury says, you opened a file with FILE_FLAG_OVERLAPPED but you're trying to use it as a stream. It is not.

async_read() is basically this loop in asio/impl/read.hpp:

for (;;)
{
    stream_.async_read_some(buffers_, ASIO_MOVE_CAST(read_op)(*this));

    buffers_.consume(bytes_transferred);
    total_transferred_ += bytes_transferred;

    if (!ec && bytes_transferred == 0)
        break;
}

The actual maximum number of bytes that will be read in one call comes from completion_condition.hpp:

enum default_max_transfer_size_t { default_max_transfer_size = 65536 };

The problem is the async_read_some() call above. You'll notice that there's no offset to tell it where to start reading. Because you are using asynchronous reads (also called "overlapped" on Windows), an offset has to be specified for every read.

This is where it ends up, in asio/detail/impl/win_iocp_handle_service.ipp:

DWORD bytes_transferred = 0;
op->Offset = offset & 0xFFFFFFFF;
op->OffsetHigh = (offset >> 32) & 0xFFFFFFFF;
BOOL ok = ::ReadFile(impl.handle_, buffer.data(),
    static_cast<DWORD>(buffer.size()),
    &bytes_transferred, op);

op->Offset and op->OffsetHigh are always 0. The pointer inside your buffer will advance correctly, but every chunk will be read from the start of the file.

There's an async_read_some_at() that's available, which you should use instead, as well as windows::random_access_handle. This will set the Offset and OffsetHigh members correctly. You will have to keep track of the number of bytes read yourself.

The documentation for the OVERLAPPED structure says this:

The Offset and OffsetHigh members together represent a 64-bit file position. It is a byte offset from the start of the file or file-like device, and it is specified by the user; the system will not modify these values. The calling process must set this member before passing the OVERLAPPED structure to functions that use an offset, such as the ReadFile or WriteFile (and related) functions.

There's also this part in Synchronous and Asynchronous I/O:

The system does not maintain the file pointer on asynchronous handles to files and devices that support file pointers (that is, seeking devices), therefore the file position must be passed to the read and write functions in the related offset data members of the OVERLAPPED structure. For more information, see WriteFile and ReadFile.

like image 180
isanae Avatar answered Dec 28 '22 09:12

isanae


This is neither a limitation of Asio, Windows, nor buffer sizes. Rather, Asio is performing exactly what it has been told to do within its specifications: it is reading 100000 bytes from a regular file as-if it was a stream. With windows::stream_handle:

  • async_read() will be composed of zero or more intermediate async_read_some() operations until either the number of bytes requested by the application has been transferred, or until an error occurs

    This operation is implemented in terms of zero or more calls to the stream's async_read_some function, and is known as a composed operation.

  • async_read_some() operations may read less than the number of requested bytes

    The read operation may not read all of the requested number of bytes.

  • each intermediate async_read_some() operation will read from the start of the stream

As the file handle being used is not truly a stream, but rather a regular file, consider using the windows::random_access_handle and async_read_at(device, 0, ...). The Random-Access HANDLEs documentation notes:

Boost.Asio provides Windows-specific classes that permit asynchronous read and write operations to be performed on HANDLEs that refer to regular files.

When using windows::random_access_handle and async_read_at():

  • async_read_at() will be composed of zero or more intermediate async_read_some_at() operations until either the number of bytes requested by the application has been transferred, or until an error occurs
  • async_read_some_at() operations may read less than the number of requested bytes
  • each intermediate async_read_some_at() operation will use an offset corresponding to end of the previous read when reading from the device (e.g. the initial offset + current bytes transferred)
like image 38
Tanner Sansbury Avatar answered Dec 28 '22 09:12

Tanner Sansbury