Is there a way to concatenate small files which are less than 5MBs on Amazon S3. Multi-Part Upload is not ok because of small files.
It's not a efficient solution to pull down all these files and do the concatenation.
So, can anybody tell me some APIs to do these?
Small Files Create Too Much Latency For Data Analytics Since streaming data comes in small files, typically you write these files to S3 rather than combine them on write. But small files impede performance.
You can use one of several methods to merge or combine files from Amazon S3 inside Amazon QuickSight: Combine files by using a manifest – In this case, the files must have the same number of fields (columns). The data types must match between fields in the same position in the file.
Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.
128 KB minimum object size. Backed with the Amazon S3 Service Level Agreement for availability.
Amazon S3 does not provide a concatenate function. It is primarily an object storage service.
You will need some process that downloads the objects, combines them, then uploads them again. The most efficient way to do this would be to download the objects in parallel, to take full advantage of available bandwidth. However, that is more complex to code.
I would recommend doing the processing on "in the cloud" to avoid having to download the objects across the Internet. Doing it on Amazon EC2 or AWS Lambda would be more efficient and less costly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With