Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use pathlib for S3 paths

Tags:

I would like to build some functionality to move files between S3 and my local file system, but pathlib appears to combine repeated slashes, breaking my aws-cli functionality:

>>> from pathlib import Path  >>> str(Path('s3://loc')) s3:/loc' 

How can I manipulate S3 paths in this way?

like image 482
beardc Avatar asked Mar 02 '18 22:03

beardc


People also ask

What is Pathlib path used for?

The path is used to identify a file. The path provides an optional sequence of directory names terminated by the final file name including the filename extension. The filename extension provides some information about the file format/ contents. The Pathlib module can deal with absolute as well as relative paths.

Why is Pathlib better than os path?

Pathlib allows you to easily iterate over that directory's content and also get files and folders that match a specific pattern. Remember the glob module that you used to import along with the os module to get paths that match a pattern?

Does Pathlib replace os path?

The pathlib module replaces many of these filesystem-related os utilities with methods on the Path object. Notice that the pathlib code puts the path first because of method chaining!

Is Pathlib better than os?

Summary. In this article, I have introduced another Python built-in library, the Pathlib. It is considered to be more advanced, convenient and provides more stunning features than the OS library. Of course, we still need to know how to use the OS library as it is one of the most powerful and basic libraries in Python.


1 Answers

Using s3path package

The s3path package makes working with S3 paths a little less painful. It is installable from PyPI or conda-forge. Use the S3Path class for actual objects in S3 and otherwise use PureS3Path which shouldn't actually access S3.

Although the previous answer by metaperture did mention this package, it didn't include the URI syntax.

Also be aware that this package has certain deficiencies which are reported in its issues.

>>> from s3path import PureS3Path  >>> PureS3Path.from_uri('s3://mybucket/foo/bar') / 'add/me' PureS3Path('/mybucket/foo/bar/add/me')  >>> _.as_uri() 's3://mybucket/foo/bar/add/me' 

Note the instance relationships to pathlib:

>>> from pathlib import Path, PurePath >>> from s3path import S3Path, PureS3Path  >>> isinstance(S3Path('/my-bucket/some/prefix'), Path) True >>> isinstance(PureS3Path('/my-bucket/some/prefix'), PurePath) True 

Using pathlib.Path

This is a lazier version of the answer by kichik using only pathlib. This approach is not necessarily recommended. It's just not always entirely necessary to use urllib.parse.

>>> from pathlib import Path  >>> orig_s3_path = 's3://mybucket/foo/bar' >>> orig_path = Path(*Path(orig_s3_path).parts[1:]) >>> orig_path PosixPath('mybucket/foo/bar')  >>> new_path = orig_path / 'add/me' >>> new_s3_path = 's3://' + str(new_path) >>> new_s3_path 's3://mybucket/foo/bar/add/me' 

Using os.path.join

For simple joins only, how about os.path.join?

>>> import os  >>> os.path.join('s3://mybucket/foo/bar', 'add/me') 's3://mybucket/foo/bar/add/me' >>> os.path.join('s3://mybucket/foo/bar/', 'add/me') 's3://mybucket/foo/bar/add/me' 

os.path.normpath cannot however be naively used:

>>> os.path.normpath('s3://mybucket/foo/bar')  # Converts 's3://' to 's3:/' 's3:/mybucket/foo/bar' 
like image 183
Asclepius Avatar answered Oct 22 '22 07:10

Asclepius