I'm trying to download a byte range from Google Cloud Storage, using their Java SDK.
I can download an entire file like this.
Storage mStorage; // initialized and working
Blob blob = mStorage.get(pBucketName, pSource);
try (ReadChannel reader = mStorage.reader(blob.getBlobId())) {
    // read bytes from read channel
}
If I want, I can ReadChannel#seek(long) until I reach a desired starting byte, and download a range from that point, but that seems inefficient (although I don't know exactly what's happening in the implementation.)
Ideally I would like to specify the Range: bytes=start-end header as shown in the Google Cloud Storage REST API, but I can't figure out how to set the header in Java.
How can I specify the byte range in the Java SDK Storage get call, or specify the header, so I can efficiently download the desired byte range?
I understand you're trying to use Google Cloud's specific interface, but there is another way that perhaps you don't know about: Google Cloud can plug into Java's NIO interface. You can get a Path to a file on a bucket and use it as normal: get a SeekableChannel into your file, then call the position(long) method to get where you want to read from.
Here is sample code I tested:
import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
(...)
    public static void readFromMiddle(String path, long offset, ByteBuffer buf) throws IOException {
        // Convert from a string to a path, using available NIO providers
        // so paths like gs://bucket/file are recognized (provided you included the google-cloud-nio
        // dependency).
        Path p = Paths.get(URI.create(path));
        SeekableByteChannel chan = Files.newByteChannel(p, StandardOpenOption.READ);
        chan.position(offset);
        chan.read(buf);
    }
You'll recognize this is normal Java code, nothing special there except perhaps the unusual way we make the Path. That's the beauty of NIO. To make this code able to understand "gs://" URLs, you need to add the google-cloud-nio dependency. For Maven it's like this:
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-nio</artifactId>
      <version>0.107.0-alpha</version>
    </dependency>
And that's all.
The documentation page shows how to do it for other dependency managers and gives some additional information.
The solution is to just invoke ReadChannel#seek(offset).
For example:
        try (ReadChannel reader = blob.reader()) {
            // offset and readLength is obtained from HTTP Range Header
            reader.seek(offset);
            ByteBuffer bytes = ByteBuffer.allocate(1 * 1024 * 1024);
            int len = 0;
            while ((len = reader.read(bytes)) > 0 && readLength > 0) {
                outputStream.write(bytes.array(), 0, (int) Math.min(len, readLength));
                bytes.clear();
                readLength -= len;
            }
        }
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With