Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FTP/SFTP access to an Amazon S3 Bucket [closed]

Is there a way to connect to an Amazon S3 bucket with FTP or SFTP rather than the built-in Amazon file transfer interface in the AWS console? Seems odd that this isn't a readily available option.

like image 723
zgall1 Avatar asked May 29 '14 17:05

zgall1


2 Answers

There are three options.

  • You can use a native managed SFTP service recently added by Amazon (which is easier to set up).
  • Or you can mount the bucket to a file system on a Linux server and access the files using the SFTP as any other files on the server (which gives you greater control).
  • Or you can just use a (GUI) client that natively supports S3 protocol (what is free).

Managed SFTP Service

  • In your Amazon AWS Console, go to AWS Transfer for SFTP and create a new server.

  • In SFTP server page, add a new SFTP user (or users).

    • Permissions of users are governed by an associated AWS role in IAM service (for a quick start, you can use AmazonS3FullAccess policy).

    • The role must have a trust relationship to transfer.amazonaws.com.

For details, see my guide Setting up an SFTP access to Amazon S3.


Mounting Bucket to Linux Server

Just mount the bucket using s3fs file system (or similar) to a Linux server (e.g. Amazon EC2) and use the server's built-in SFTP server to access the bucket.

  • Install the s3fs

  • Add your security credentials in a form access-key-id:secret-access-key to /etc/passwd-s3fs

  • Add a bucket mounting entry to fstab:

      <bucket> /mnt/<bucket> fuse.s3fs rw,nosuid,nodev,allow_other 0 0 

For details, see my guide Setting up an SFTP access to Amazon S3.


Use S3 Client

Or use any free "FTP/SFTP client", that's also an "S3 client", and you do not have setup anything on server-side. For example, my WinSCP or Cyberduck.

WinSCP has even scripting and .NET/PowerShell interface, if you need to automate the transfers.

like image 50
Martin Prikryl Avatar answered Sep 25 '22 09:09

Martin Prikryl


Update

S3 now offers a fully-managed SFTP Gateway Service for S3 that integrates with IAM and can be administered using aws-cli.


There are theoretical and practical reasons why this isn't a perfect solution, but it does work...

You can install an FTP/SFTP service (such as proftpd) on a linux server, either in EC2 or in your own data center... then mount a bucket into the filesystem where the ftp server is configured to chroot, using s3fs.

I have a client that serves content out of S3, and the content is provided to them by a 3rd party who only supports ftp pushes... so, with some hesitation (due to the impedance mismatch between S3 and an actual filesystem) but lacking the time to write a proper FTP/S3 gateway server software package (which I still intend to do one of these days), I proposed and deployed this solution for them several months ago and they have not reported any problems with the system.

As a bonus, since proftpd can chroot each user into their own home directory and "pretend" (as far as the user can tell) that files owned by the proftpd user are actually owned by the logged in user, this segregates each ftp user into a "subdirectory" of the bucket, and makes the other users' files inaccessible.


There is a problem with the default configuration, however.

Once you start to get a few tens or hundreds of files, the problem will manifest itself when you pull a directory listing, because ProFTPd will attempt to read the .ftpaccess files over, and over, and over again, and for each file in the directory, .ftpaccess is checked to see if the user should be allowed to view it.

You can disable this behavior in ProFTPd, but I would suggest that the most correct configuration is to configure additional options -o enable_noobj_cache -o stat_cache_expire=30 in s3fs:

-o stat_cache_expire (default is no expire)

specify expire time(seconds) for entries in the stat cache

Without this option, you'll make fewer requests to S3, but you also will not always reliably discover changes made to objects if external processes or other instances of s3fs are also modifying the objects in the bucket. The value "30" in my system was selected somewhat arbitrarily.

-o enable_noobj_cache (default is disable)

enable cache entries for the object which does not exist. s3fs always has to check whether file(or sub directory) exists under object(path) when s3fs does some command, since s3fs has recognized a directory which does not exist and has files or subdirectories under itself. It increases ListBucket request and makes performance bad. You can specify this option for performance, s3fs memorizes in stat cache that the object (file or directory) does not exist.

This option allows s3fs to remember that .ftpaccess wasn't there.


Unrelated to the performance issues that can arise with ProFTPd, which are resolved by the above changes, you also need to enable -o enable_content_md5 in s3fs.

-o enable_content_md5 (default is disable)

verifying uploaded data without multipart by content-md5 header. Enable to send "Content-MD5" header when uploading a object without multipart posting. If this option is enabled, it has some influences on a performance of s3fs when uploading small object. Because s3fs always checks MD5 when uploading large object, this option does not affect on large object.

This is an option which never should have been an option -- it should always be enabled, because not doing this bypasses a critical integrity check for only a negligible performance benefit. When an object is uploaded to S3 with a Content-MD5: header, S3 will validate the checksum and reject the object if it's corrupted in transit. However unlikely that might be, it seems short-sighted to disable this safety check.

Quotes are from the man page of s3fs. Grammatical errors are in the original text.

like image 20
Michael - sqlbot Avatar answered Sep 23 '22 09:09

Michael - sqlbot