I would like to build some functionality to move files between S3 and my local file system, but pathlib
appears to combine repeated slashes, breaking my aws-cli functionality:
>>> from pathlib import Path >>> str(Path('s3://loc')) s3:/loc'
How can I manipulate S3 paths in this way?
The path is used to identify a file. The path provides an optional sequence of directory names terminated by the final file name including the filename extension. The filename extension provides some information about the file format/ contents. The Pathlib module can deal with absolute as well as relative paths.
Pathlib allows you to easily iterate over that directory's content and also get files and folders that match a specific pattern. Remember the glob module that you used to import along with the os module to get paths that match a pattern?
The pathlib module replaces many of these filesystem-related os utilities with methods on the Path object. Notice that the pathlib code puts the path first because of method chaining!
Summary. In this article, I have introduced another Python built-in library, the Pathlib. It is considered to be more advanced, convenient and provides more stunning features than the OS library. Of course, we still need to know how to use the OS library as it is one of the most powerful and basic libraries in Python.
s3path
packageThe s3path
package makes working with S3 paths a little less painful. It is installable from PyPI or conda-forge. Use the S3Path
class for actual objects in S3 and otherwise use PureS3Path
which shouldn't actually access S3.
Although the previous answer by metaperture did mention this package, it didn't include the URI syntax.
Also be aware that this package has certain deficiencies which are reported in its issues.
>>> from s3path import PureS3Path >>> PureS3Path.from_uri('s3://mybucket/foo/bar') / 'add/me' PureS3Path('/mybucket/foo/bar/add/me') >>> _.as_uri() 's3://mybucket/foo/bar/add/me'
Note the instance relationships to pathlib
:
>>> from pathlib import Path, PurePath >>> from s3path import S3Path, PureS3Path >>> isinstance(S3Path('/my-bucket/some/prefix'), Path) True >>> isinstance(PureS3Path('/my-bucket/some/prefix'), PurePath) True
pathlib.Path
This is a lazier version of the answer by kichik using only pathlib
. This approach is not necessarily recommended. It's just not always entirely necessary to use urllib.parse
.
>>> from pathlib import Path >>> orig_s3_path = 's3://mybucket/foo/bar' >>> orig_path = Path(*Path(orig_s3_path).parts[1:]) >>> orig_path PosixPath('mybucket/foo/bar') >>> new_path = orig_path / 'add/me' >>> new_s3_path = 's3://' + str(new_path) >>> new_s3_path 's3://mybucket/foo/bar/add/me'
os.path.join
For simple joins only, how about os.path.join
?
>>> import os >>> os.path.join('s3://mybucket/foo/bar', 'add/me') 's3://mybucket/foo/bar/add/me' >>> os.path.join('s3://mybucket/foo/bar/', 'add/me') 's3://mybucket/foo/bar/add/me'
os.path.normpath
cannot however be naively used:
>>> os.path.normpath('s3://mybucket/foo/bar') # Converts 's3://' to 's3:/' 's3:/mybucket/foo/bar'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With