I'm trying to download a byte range from Google Cloud Storage, using their Java SDK.
I can download an entire file like this.
Storage mStorage; // initialized and working
Blob blob = mStorage.get(pBucketName, pSource);
try (ReadChannel reader = mStorage.reader(blob.getBlobId())) {
// read bytes from read channel
}
If I want, I can ReadChannel#seek(long)
until I reach a desired starting byte, and download a range from that point, but that seems inefficient (although I don't know exactly what's happening in the implementation.)
Ideally I would like to specify the Range: bytes=start-end
header as shown in the Google Cloud Storage REST API, but I can't figure out how to set the header in Java.
How can I specify the byte range in the Java SDK Storage get call, or specify the header, so I can efficiently download the desired byte range?
I understand you're trying to use Google Cloud's specific interface, but there is another way that perhaps you don't know about: Google Cloud can plug into Java's NIO interface. You can get a Path
to a file on a bucket and use it as normal: get a SeekableChannel into your file, then call the position(long)
method to get where you want to read from.
Here is sample code I tested:
import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
(...)
public static void readFromMiddle(String path, long offset, ByteBuffer buf) throws IOException {
// Convert from a string to a path, using available NIO providers
// so paths like gs://bucket/file are recognized (provided you included the google-cloud-nio
// dependency).
Path p = Paths.get(URI.create(path));
SeekableByteChannel chan = Files.newByteChannel(p, StandardOpenOption.READ);
chan.position(offset);
chan.read(buf);
}
You'll recognize this is normal Java code, nothing special there except perhaps the unusual way we make the Path
. That's the beauty of NIO. To make this code able to understand "gs://" URLs, you need to add the google-cloud-nio dependency. For Maven it's like this:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-nio</artifactId>
<version>0.107.0-alpha</version>
</dependency>
And that's all.
The documentation page shows how to do it for other dependency managers and gives some additional information.
The solution is to just invoke ReadChannel#seek(offset)
.
For example:
try (ReadChannel reader = blob.reader()) {
// offset and readLength is obtained from HTTP Range Header
reader.seek(offset);
ByteBuffer bytes = ByteBuffer.allocate(1 * 1024 * 1024);
int len = 0;
while ((len = reader.read(bytes)) > 0 && readLength > 0) {
outputStream.write(bytes.array(), 0, (int) Math.min(len, readLength));
bytes.clear();
readLength -= len;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With