EC2 provides a very convenient, on-demand scalable mechanism to execute distributable(parallel-izable) processes and S3 provides a reliable storage service.
I was trying to employ EC2 nodes for a ETL & analytics process, this process needs large amount of data(100GB - 1TB) ingested very quickly (and several times in a day) and adequate compute resources to be made available for a short duration.
The above design needs
But as yet
In a private data-center setup one can setup a faster (say 10Gbps) dedicated line between storage and physical nodes.
Are there any possible alternatives/services options in case of aws that can address the above requirements.
You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second. Similarly, you can scale write operations by writing to multiple prefixes.
Resolution. Traffic between Amazon EC2 and Amazon S3 can leverage up to 100 Gbps of bandwidth to VPC endpoints and public IPs in the same Region.
Depends, hugely, on all sorts of things - how much network activity the other EC2 instances on the same physical server are doing, the particular S3 node you're hitting at any one time, whether you're in the same region as your S3 endpoint, etc.
You can benchmark yourself, but even then it'll vary a lot. I've gotten multiple megabytes per second at times and a couple hundred kilobytes at other times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With