I have a 900 MB file that I'd like to download to disk from S3 if it isn't already in place downloaded. Is there an easy way for me to only download the file if it isn't already in place? I know S3 supports querying MD5 checksum of files, but I'm hoping not to have to build this logic myself.
If you send the file to the existing key, it will overwrite that file once the upload is complete.
Using Amazon S3 Object Lock, you can prevent an object from being deleted or overwritten for a fixed amount of time, or until the legal hold is removed. An object version can have either a combination or both a retention period and a legal hold.
Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).
To download an entire bucket to your local file system, use the AWS CLI sync command, passing it the s3 bucket as a source and a directory on your file system as a destination, e.g. aws s3 sync s3://YOUR_BUCKET . . The sync command recursively copies the contents of the source to the destination.
You can use AWS CLI's s3 sync
command.
Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination.
According to this forum thread, you can use sync
to synchronize only one file:
aws s3 sync s3://bucket/path/ local/path/ --exclude "*" --include "File.txt"
It says: sync the given paths, exclude all files, but include "File.txt"
- so it will sync only "File.txt"
under those given paths.
Or with the Java SDK:
According to the javadoc, there is a getObjectMetadata
method which will return information about an S3 object (file) without downloading it's contents.
The method returns an ObjectMetadata
object which can give you some useful information:
getLastModified
method:Gets the value of the Last-Modified header, indicating the date and time at which Amazon S3 last recorded a modification to the associated object.
getContentMD5
method:Gets the base64 encoded 128-bit MD5 digest of the associated object (content - not including headers) according to RFC 1864.
getETag
method:Gets the hex encoded 128-bit MD5 digest of the associated object according to RFC 1864.
I have used below code to download S3 files which have timestamp greater than the local folder timestamp. First it's check if any of the files in S3 folder have timestamp greater than the local folder timestamp. If yes then download those files only.
TransferManager transferManager = TransferManagerBuilder.standard().build();
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().build();
Path location = Paths.get("/data/test/");
FileTime lastModifiedTime = null;
try {
lastModifiedTime = Files.getLastModifiedTime(location, LinkOption.NOFOLLOW_LINKS);
} catch (IOException e) {
e.printStackTrace();
}
Date lastUpdatedTime = new Date(lastModifiedTime.toMillis());
ObjectListing listing = amazonS3.listObjects("bucket", "test-folder");
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
for (S3ObjectSummary os: summaries) {
if(os.getLastModified().after(lastUpdatedTime)) {
try {
String fileName="/data/test/"+os.getKey();
Download multipleFileDownload = transferManager.download(bucket, os.getKey(), new File(fileName));
while (multipleFileDownload.isDone() == false) {
Thread.sleep(1000);
}
}catch(InterruptedException i){
LOG.error("Exception Occurred while downloading the file ",i);
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With