Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing files over 2.15 GB in Java using Kaitai Struct

I'm parsing large PCAP files in Java using Kaitai-Struct. Whenever the file size exceeds Integer.MAX_VALUE bytes I face an IllegalArgumentException caused by the size limit of the underlying ByteBuffer.

I haven't found references to this issue elsewhere, which leads me to believe that this is not a library limitation but a mistake in the way I'm using it.

Since the problem is caused by trying to map the whole file into the ByteBuffer I'd think that the solution would be mapping only the first region of the file, and as the data is being consumed map again skipping the data already parsed.

As this is done within the Kaitai Struct Runtime library it would mean to write my own class extending fom KatiaiStream and overwrite the auto-generated fromFile(...) method, and this doesn't really seem the right approach.

The auto-generated method to parse from file for the PCAP class is.

public static Pcap fromFile(String fileName) throws IOException {
  return new Pcap(new ByteBufferKaitaiStream(fileName));
}

And the ByteBufferKaitaiStream provided by the Kaitai Struct Runtime library is backed by a ByteBuffer.

private final FileChannel fc;
private final ByteBuffer bb;

public ByteBufferKaitaiStream(String fileName) throws IOException {
    fc = FileChannel.open(Paths.get(fileName), StandardOpenOption.READ);
    bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
}

Which in turn is limitted by the ByteBuffer max size.

Am I missing some obvious workaround? Is it really a limitation of the implementation of Katiati Struct in Java?

like image 388
Julian Avatar asked May 20 '19 09:05

Julian


2 Answers

There are two separate issues here:

  1. Running Pcap.fromFile() for large files is generally not a very efficient method, as you'll eventually get all files parsed into memory array at once. A example on how to avoid that is given in kaitai_struct/issues/255. The basic idea is that you'd want to have control over how you read every packet, and then dispose of every packet after you've parsed / accounted it somehow.

  2. 2GB limit on Java's mmaped files. To mitigate that, you can use alternative RandomAccessFile-based KaitaiStream implementation: RandomAccessFileKaitaiStream — it might be slower, but it should avoid that 2GB problem.

like image 170
GreyCat Avatar answered Sep 27 '22 18:09

GreyCat


This library provides a ByteBuffer implementation which uses long offset. I haven't tried this approach but looks promising. See section Mapping Files Bigger than 2 GB

http://www.kdgregory.com/index.php?page=java.byteBuffer

public int getInt(long index)
{
    return buffer(index).getInt();
}

private ByteBuffer buffer(long index)
{
    ByteBuffer buf = _buffers[(int)(index / _segmentSize)];
    buf.position((int)(index % _segmentSize));
    return buf;
}
public MappedFileBuffer(File file, int segmentSize, boolean readWrite)
throws IOException
{
    if (segmentSize > MAX_SEGMENT_SIZE)
        throw new IllegalArgumentException(
                "segment size too large (max " + MAX_SEGMENT_SIZE + "): " + segmentSize);

    _segmentSize = segmentSize;
    _fileSize = file.length();

    RandomAccessFile mappedFile = null;
    try
    {
        String mode = readWrite ? "rw" : "r";
        MapMode mapMode = readWrite ? MapMode.READ_WRITE : MapMode.READ_ONLY;

        mappedFile = new RandomAccessFile(file, mode);
        FileChannel channel = mappedFile.getChannel();

        _buffers = new MappedByteBuffer[(int)(_fileSize / segmentSize) + 1];
        int bufIdx = 0;
        for (long offset = 0 ; offset < _fileSize ; offset += segmentSize)
        {
            long remainingFileSize = _fileSize - offset;
            long thisSegmentSize = Math.min(2L * segmentSize, remainingFileSize);
            _buffers[bufIdx++] = channel.map(mapMode, offset, thisSegmentSize);
        }
    }
    finally
    {
        // close quietly
        if (mappedFile != null)
        {
            try
            {
                mappedFile.close();
            }
            catch (IOException ignored) { /* */ }
        }
    }
}
like image 21
Dzmitry Bahdanovich Avatar answered Sep 27 '22 18:09

Dzmitry Bahdanovich