I am trying to persist a given stream of data to an S3 compatible storage. The size is not known before the stream ends and can vary from 5MB to ~500GB.
I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3. Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?
The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk
When I try to pipe data in with a pipe |
I get the following error:
failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek
Am I doing something wrong or is the example not working as I expect it to?
I although tried minio-go, with PutObject() or client.PutObjectStreaming(). This is functional but consumes as much memory as the data to store.
To upload folders and files to an S3 bucketSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.
To access Amazon Simple Storage Service, create an AWS. S3 service object. Call the listBuckets method of the Amazon S3 service object to retrieve a list of your buckets. The data parameter of the callback function has a Buckets property containing an array of maps to represent the buckets.
Navigate to All Settings > Raw Data Export > CSV Upload. Toggle the switch to ON. Select Amazon S3 Bucket from the dropdown menu. Enter your Access Key ID, Secret Access Key, and bucket name.
You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin
"unseekable" by wrapping it into an io.Reader
. This is because the Uploader
, while it requires only an io.Reader
as the input body, under the hood it does a check to see whether the input body is also a Seeker
and if it is, it does call Seek
on it. And since os.Stdin
is just an *os.File
which implements the Seeker
interface, by default, you would get the same error you got from PutObjectWithContext
.
The Uploader
also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.
Here's a modified version of the linked example, stripped off of code that can remain unchanged.
package main
import (
// ...
"io"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
type reader struct {
r io.Reader
}
func (r *reader) Read(p []byte) (int, error) {
return r.r.Read(p)
}
func main() {
// ... parse flags
sess := session.Must(session.NewSession())
uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
u.PartSize = 20 << 20 // 20MB
// ... more configuration
})
// ... context stuff
_, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
Body: &reader{os.Stdin},
})
// ... handle error
}
As to whether this is a better solution than minio-go
I do not know, you'll have to test that yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With