Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading an object byte range from Google Cloud Storage using Java SDK

I'm trying to download a byte range from Google Cloud Storage, using their Java SDK.

I can download an entire file like this.

Storage mStorage; // initialized and working

Blob blob = mStorage.get(pBucketName, pSource);

try (ReadChannel reader = mStorage.reader(blob.getBlobId())) {
    // read bytes from read channel
}

If I want, I can ReadChannel#seek(long) until I reach a desired starting byte, and download a range from that point, but that seems inefficient (although I don't know exactly what's happening in the implementation.)

Ideally I would like to specify the Range: bytes=start-end header as shown in the Google Cloud Storage REST API, but I can't figure out how to set the header in Java.

How can I specify the byte range in the Java SDK Storage get call, or specify the header, so I can efficiently download the desired byte range?

like image 819
the_storyteller Avatar asked Nov 07 '22 16:11

the_storyteller


2 Answers

I understand you're trying to use Google Cloud's specific interface, but there is another way that perhaps you don't know about: Google Cloud can plug into Java's NIO interface. You can get a Path to a file on a bucket and use it as normal: get a SeekableChannel into your file, then call the position(long) method to get where you want to read from.

Here is sample code I tested:

import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

(...)

    public static void readFromMiddle(String path, long offset, ByteBuffer buf) throws IOException {
        // Convert from a string to a path, using available NIO providers
        // so paths like gs://bucket/file are recognized (provided you included the google-cloud-nio
        // dependency).
        Path p = Paths.get(URI.create(path));
        SeekableByteChannel chan = Files.newByteChannel(p, StandardOpenOption.READ);
        chan.position(offset);
        chan.read(buf);
    }


You'll recognize this is normal Java code, nothing special there except perhaps the unusual way we make the Path. That's the beauty of NIO. To make this code able to understand "gs://" URLs, you need to add the google-cloud-nio dependency. For Maven it's like this:

    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-nio</artifactId>
      <version>0.107.0-alpha</version>
    </dependency>

And that's all.

The documentation page shows how to do it for other dependency managers and gives some additional information.

like image 53
TubesHerder Avatar answered Nov 13 '22 16:11

TubesHerder


The solution is to just invoke ReadChannel#seek(offset).

For example:

        try (ReadChannel reader = blob.reader()) {
            // offset and readLength is obtained from HTTP Range Header
            reader.seek(offset);
            ByteBuffer bytes = ByteBuffer.allocate(1 * 1024 * 1024);
            int len = 0;
            while ((len = reader.read(bytes)) > 0 && readLength > 0) {
                outputStream.write(bytes.array(), 0, (int) Math.min(len, readLength));
                bytes.clear();
                readLength -= len;
            }
        }
like image 35
Aldrian Avatar answered Nov 13 '22 17:11

Aldrian