Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for a way to force a short read in linux

Tags:

c

linux

io

testing

I am looking for a method of producing short reads in linux so I can unit test the handling code around them.

I have a number of methods which at the lower levels call pread / pread64 to read from a file within the file system. These are designed to handle situations where a short read occurs (the number of bytes read is less than the number requested).

I have seen situations where short reads occur (across networked file systems).

Ideally I would be able to create a file that would allow N bytes to be read and then a short read of M bytes would occur, followed by normal reads as expected. This would allow unit tests to point at the file / file system.

Thanks!

like image 593
CoreyP Avatar asked Oct 19 '22 01:10

CoreyP


2 Answers

If you know the library call(s) being made that you want to intercept, you can interpose on the call(s) with a shared object loaded via LD_PRELOAD.

shortread.c:

#include <sys/types.h>
#include <dlfcn.h>

#define MAX_FDS 1024

static int short_read_array[ MAX_FDS ];

// #define these to match your system's values
// (need to be really careful with header files since
// getting open() declared would make things very
// difficult - just try this with open( const char *, int, ...);
// declared to see what I mean...)
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2

// note that the mode bits for read/write are
// not a bitwise-or - they are distinct values
#define MODE_BITS 3

// it's much easier to *NOT* even deal with the
// fact that open() is a varargs function
// but that means probably having to do some
// typedef's and #defines to get this to compile

// typedef some function points to make things easier
typedef int ( *open_ptr_t )( const char *name, int flags, mode_t mode );
typedef ssize_t ( *read_ptr_t )( int fd, void *buf, size_t bytes );
typedef int ( *close_ptr_t )( int fd );

// function points to the real IO library calls
static open_ptr_t real_open = NULL;
static read_ptr_t real_read = NULL;
static close_ptr_t real_close = NULL;

// this will return non-zero if 'filename' is a file
// to cause short reads on
static int shortReadsOnFd( const char *filename )
{
    // add logic here based on the file name to
    // return non-zero if you want to do
    // short reads on this file
    //
    // return( 1 );
    return( 0 );
}

// interpose on open()
int open( const char *filename, int flags, mode_t mode )
{
    static pthread_mutex_t open_mutex = PTHREAD_MUTEX_INITIALIZER;
    int fd;

    pthread_mutex_lock( &open_mutex );
    if ( NULL == real_open )
    {
        real_open = dlsym( RTLD_NEXT, "open" );
    }
    pthread_mutex_unlock( &open_mutex );

    fd = real_open( filename, flags, mode );
    if ( ( -1 == fd ) || ( fd >= MAX_FDS ) )
    {
        return( fd );
    }

    int mode_bits = flags & MODE_BITS;

    // if the file can be read from, check if this is a file
    // to do short reads on
    if ( ( O_RDONLY == mode_bits ) || ( O_RDWR == mode_bits ) )
    {
        short_read_array[ fd ] = shortReadsOnFd( filename );
    }

    return( fd );
}

ssize_t read( int fd, void *buffer, size_t bytes )
{
    static pthread_mutex_t read_mutex = PTHREAD_MUTEX_INITIALIZER;

    if ( ( fd < MAX_FDS ) && ( short_read_array[ fd ] ) )
    {
        // read less bytes than the caller asked for
        bytes /= 2;
        if ( 0 == bytes )
        {
            bytes = 1;
        }
    }

    pthread_mutex_lock( &read_mutex );
    if ( NULL == real_read )
    {
        real_read = dlsym( RTLD_NEXT, "read" );
    }
    pthread_mutex_unlock( &read_mutex );

    return( real_read( fd, buffer, bytes ) );
}

int close( int fd )
{
    static pthread_mutex_t close_mutex = PTHREAD_MUTEX_INITIALIZER;

    pthread_mutex_lock( &close_mutex );
    if ( NULL == real_close )
    {
        real_close = dlsym( RTLD_NEXT, "close" );
    }
    pthread_mutex_unlock( &close_lock );

    if ( fd < MAX_FDS )
    {
        short_read_array[ fd ] = 0;
    }

    return( real_close( fd ) );
}

Compile with something like:

gcc -shared [-m32|-m64] shortread.c -o libshortread.so

Then:

export LD_PRELOAD=/path/to/libshortread.so

Be extremely careful with such an LD_PRELOAD - all processes in the process tree will be forced to load the library. A 32-bit process will fail to run if it has to load a 64-bit library, as will a 64-bit process that is forced to try loading a 32-bit library. You can add an init function to the source above that removes the LD_PRELOAD environment variable (or sets it to something harmless) to control that somewhat.

You also probably need to be careful if any application uses the O_DIRECT flag for open(). Modifying the number of bytes being read can break direct IO for some Linux file systems and/or implementations, as only page-size IO operations may be supported.

And this code only handles read(). You may also need to deal with creat(). Also pread(), readat(), aio_read(), and lio_listio(), (and maybe even a few others that I can't recall at the moment) although that's admittedly not very likely. And beware of 32-bit processes that handle large files. It's been a while since I've dealt with those, but that can get ugly as I recall.

Another caveat is the calls such as fopen() and fread() may not call the open() and read() library calls and may issue the relevant system call directly. In that case, you won't be able to modify the behavior of those calls easily. Interposing on the entire family of STDIO-based calls that can read data such as fgets() can be a very a difficult thing to do without breaking things.

And if you know your application(s) are single-threaded, you can drop the mutexes.

like image 79
Andrew Henle Avatar answered Oct 29 '22 13:10

Andrew Henle


In the end I went with a solution using mkfifo().

I create the named pipe then connect a writer to it (and end up wrapping it in a JNI library to be used from Java). The asynchronous writer can then be told to write data at the correct times, at which point the connected reader only gets the available / written bytes rather than the total requested number.

like image 22
CoreyP Avatar answered Oct 29 '22 12:10

CoreyP