Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to establish a fast and reliable S3 to EC2 connection [closed]

EC2 provides a very convenient, on-demand scalable mechanism to execute distributable(parallel-izable) processes and S3 provides a reliable storage service.

I was trying to employ EC2 nodes for a ETL & analytics process, this process needs large amount of data(100GB - 1TB) ingested very quickly (and several times in a day) and adequate compute resources to be made available for a short duration.

The above design needs

  1. A high-bandwidth/fast connection between S3 and EC2.
  2. S3--> EC2 connection should also be reliable since scheduling of starting, pumping-in data, executing processes and terminating nodes has to be done as soon as possible not just to save costs but also because SLA's are involved.

But as yet

  1. The only means of pulling data out of S3 seems to be via http and hence it is constrained by the download bandwidths of the EC2 nodes.
  2. Also the data ingestion goes over the internet and hence can be unreliable enough for strict scheduling purposes necessitating adequate buffering across jobs.

In a private data-center setup one can setup a faster (say 10Gbps) dedicated line between storage and physical nodes.

Are there any possible alternatives/services options in case of aws that can address the above requirements.

like image 695
sandeepkunkunuru Avatar asked Jun 14 '12 20:06

sandeepkunkunuru


People also ask

How can I speed up my Amazon S3?

You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second. Similarly, you can scale write operations by writing to multiple prefixes.

How fast is S3 to EC2?

Resolution. Traffic between Amazon EC2 and Amazon S3 can leverage up to 100 Gbps of bandwidth to VPC endpoints and public IPs in the same Region.


1 Answers

Depends, hugely, on all sorts of things - how much network activity the other EC2 instances on the same physical server are doing, the particular S3 node you're hitting at any one time, whether you're in the same region as your S3 endpoint, etc.

You can benchmark yourself, but even then it'll vary a lot. I've gotten multiple megabytes per second at times and a couple hundred kilobytes at other times.

like image 92
ceejayoz Avatar answered Nov 23 '22 22:11

ceejayoz