How to establish a fast and reliable S3 to EC2 connection [closed]

Tags:

amazon-ec2

EC2 provides a very convenient, on-demand scalable mechanism to execute distributable(parallel-izable) processes and S3 provides a reliable storage service.

I was trying to employ EC2 nodes for a ETL & analytics process, this process needs large amount of data(100GB - 1TB) ingested very quickly (and several times in a day) and adequate compute resources to be made available for a short duration.

The above design needs

A high-bandwidth/fast connection between S3 and EC2.
S3--> EC2 connection should also be reliable since scheduling of starting, pumping-in data, executing processes and terminating nodes has to be done as soon as possible not just to save costs but also because SLA's are involved.

But as yet

The only means of pulling data out of S3 seems to be via http and hence it is constrained by the download bandwidths of the EC2 nodes.
Also the data ingestion goes over the internet and hence can be unreliable enough for strict scheduling purposes necessitating adequate buffering across jobs.

In a private data-center setup one can setup a faster (say 10Gbps) dedicated line between storage and physical nodes.

Are there any possible alternatives/services options in case of aws that can address the above requirements.

695

asked Jun 14 '12 20:06

sandeepkunkunuru

1 Answers

Depends, hugely, on all sorts of things - how much network activity the other EC2 instances on the same physical server are doing, the particular S3 node you're hitting at any one time, whether you're in the same region as your S3 endpoint, etc.

You can benchmark yourself, but even then it'll vary a lot. I've gotten multiple megabytes per second at times and a couple hundred kilobytes at other times.

answered Nov 23 '22 22:11

ceejayoz

Related questions
                            
                                How do I use AWS S3 to store user uploaded pictures?
                            
                                When is the data transferred when you create an external table in Hive with an S3 location?
                            
                                ASP.NET uploading a file to Amazon S3
                            
                                Obtain S3 presigned post url with query parameters for a mobile client
                            
                                Paperclip AV Transcoder not working on remote server
                            
                                Gitlab with Git LFS on Amazon S3
                            
                                amazon-ecs-agent is always restarting
                            
                                Angular2 File upload for Amazon s3 bucket
                            
                                Viewing private images uploaded on AWS S3 - Creating signature from Secret Access Key
                            
                                WebKitFormBoundary included in file payload on direct upload to s3
                            
                                How to read file chunk by chunk from S3 using aws-java-sdk
                            
                                Extract and save attachment from email (via SES) into AWS S3
                            
                                spark read partitioned data in S3 partly in glacier
                            
                                Since QuickSight can directly query S3, when would we need to use Athena as data source for QuickSight? [closed]
                            
                                Uploading a Dataframe to AWS S3 Bucket from SageMaker
                            
                                Amazon s3: "Block public access" settings to allow for public read private write with signed url
                            
                                Serverless: Deplyment error S3 Bucket already exists in stack
                            
                                How do I upload a file to a pre-signed URL in AWS using Node.js and Axios?
                            
                                Granular 'public' settings on uploaded files with Fog and Carrierwave
                            
                                nginx proxy and 404 redirect

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With