Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does s3fs cache in /tmp?

I'm using s3fs to mount a lot of files to an S3 bucket. It works fine except the fact that my local disk space is also growing a lot (the content in the /tmp directory).

My command is:

$ su ec2-user -c '/usr/bin/s3fs my-bucket-name -o use_cache=/tmp /home/ec2-user/dir'`

I'm using the use_cache parameter but what is actually cached? Are this files which still need to be uploaded to s3 and are cached on my local machine? Can I just delete it during upload/mount or not? And will my upload go quicker if I turn it off (if it's for other purposes)?

like image 841
DenCowboy Avatar asked Jan 21 '19 14:01

DenCowboy


People also ask

What is s3fs in Linux?

DESCRIPTION. s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files).

What is the use of s3fs?

s3fs is a FUSE-backed file interface for S3, allowing you to mount your S3 buckets on your local Linux or macOS operating system. s3fs preserves the native object format for files, so they can be used them with other tools including AWS CLI.

Is s3fs stable?

s3fs is stable and is being used in number of production environments, e.g., rsync backup to s3.

What is AWS s3fs?

A S3 bucket can be mounted in a AWS instance as a file system known as S3fs. S3fs is a FUSE file-system that allows you to mount an Amazon S3 bucket as a local file-system. It behaves like a network attached drive, as it does not store anything on the Amazon EC2, but user can access the data on S3 from EC2 instance.

How does the s3fs cache work?

s3fs automatically maintains a local cache of files. The cache folder is specified by the parameter of "-o use_cache". It is only a local cache that can be deleted at any time. s3fs rebuilds it if necessary. Whenever s3fs needs to read or write a file on S3, it first creates the file in the cache directory and operates on it.

Why does s3fs use same Temp File for multiple processes?

The temp file is created at opening the file, and removed at closing. So that, the cache (temp file) is kept only during opening file. But when one process opens the file and creates (keeps) temp file and other process tries to open same file, s3fs uses same temp file for other processs.

Does s3fs always load the entire file to local disk?

Looking at the code, it appears that s3fs will always load the entire file to local disk first regardless of whether use_cache is specified. The only difference is when use_cache is not specified local file will be deleted when file is closed. Any plans to get away from that?

What is s3fs Mount in Linux?

For root. For unprivileged user. s3fs is a FUSE filesystem that allows you to mount an Amazon S3 bucket as a local filesystem. It stores files natively and transparently in S3 (i.e., you can use other programs to access the same files).


1 Answers

From the s3fs wiki (which is a bit hard to find).

If enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on s3 it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse release() is called, s3fs will re-upload the file to s3 if it has been changed. s3fs uses md5 checksums to minimize downloads from s3. Note: this is different from the stat cache (see below).

Local file caching works by calculating and comparing md5 checksums (ETag HTTP header).

The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. Note: this directory grows unbounded and can fill up a file system dependent upon the bucket and reads to that bucket.

like image 73
bwest Avatar answered Sep 19 '22 14:09

bwest