Historically I have always used the following for reading files in python
:
with open("file", "r") as f:
for line in f:
# do thing to line
Is this still the recommend approach? Are there any drawbacks to using the following:
from pathlib import Path
path = Path("file")
for line in path.open():
# do thing to line
Most of the references I found are using the with
keyword for opening files for the convenience of not having to explicitly close the file. Is this applicable for the iterator approach here?
with open()
docs
The pathlib module of Python makes it very easy and efficient to deal with file paths. The os. path module can also be used to handle path name operations. The difference is that path module creates strings that represent file paths whereas pathlib creates a path object.
With pathlib , file paths can be represented by proper Path objects instead of plain strings as before. These objects make code dealing with file paths: Easier to read, especially because / is used to join paths together. More powerful, with most necessary methods and properties available directly on the object.
In this article, I have introduced another Python built-in library, the Pathlib. It is considered to be more advanced, convenient and provides more stunning features than the OS library.
Something that wasn't mentioned yet: if all you wanted to do was read or write some text (or bytes) then you no longer need to use the context manager explicitly when using pathlib:
>>> import pathlib
>>> path = pathlib.Path("/tmp/example.txt")
>>> path.write_text("hello world")
11
>>> path.read_text()
'hello world'
>>> path.read_bytes()
b'hello world'
Opening a file to iterate lines should still use a with-statement, for all the same reasons as using the context manager with open
, as the docs show:
>>> with path.open() as f:
... for line in f:
... print(line)
...
hello world
Keep in mind that a Path
object is for working with filesystem paths. Just like the built-in library of Python, there is an open method but no close in a Path object.
The .close
is in the file handle that is returned by either the built-in open or by using the Path object's open method:
>>> from pathlib import Path
>>> p=Path(some_file)
>>> p
PosixPath('/tmp/file')
You can open that Path object either with the built-in open function or the open method in the Path object:
>>> fh=open(p) # open built-in function
>>> fh
<_io.TextIOWrapper name='/tmp/file' mode='r' encoding='UTF-8'>
>>> fh.close()
>>> fh=p.open() # Path open method which aliases to os.open
>>> fh
<_io.TextIOWrapper name='/tmp/file' mode='r' encoding='UTF-8'>
>>> fh.close()
You can have a look at the source code for pathlib on Github as an indication of how the authors of pathlib
do it in their own code.
What I observe is one of three things.
The most common by far is to use with
:
from pathlib import Path
p=Path('/tmp/file')
#create a file
with p.open(mode='w') as fi:
fi.write(f'Insides of: {str(p)}')
# read it back and test open or closed
with p.open(mode='r') as fi:
print(f'{fi.read()} closed?:{fi.closed}')
# prints 'Insides of: /tmp/file closed?:False'
As you likely know, at the end of the with
block the __exit__
methods are called. For a file, that means the file is closed. This is the most common approach in the pathlib
source code.
Second, you can also see in the source that a pathlib object maintains an entry and exit status and a flag of the file being open and closed. The os.close
functions is not explicitly called however. You can check that status with the .closed
accessor.
fh=p.open()
print(f'{fh.read()} closed?:{fh.closed}')
# prints Insides of: /tmp/file closed?:False
# fi will only be closed when fi goes out of scope...
# or you could (and should) do fh.close()
with p.open() as fi:
pass
print(f'closed?:{fi.closed}')
# fi still in scope but implicitly closed at the end of the with bloc
# prints closed?:True
Third, on cPython, files are closed when the file handle goes out of scope. This is not portable or considered 'good practice' to rely on, but commonly it is. There are instances of this in the pathlib source code.
Pathlib
is object oriented way for manipulating filesystem paths.
Recommended way of opening a file using pathlib module would be using context manager:
p = Path("my_file.txt")
with p.open() as f:
f.readline()
This ensures closing the file after it's usage.
In both examples you provided, you are not closing a files because you open them inplace.
Since p.open()
returns file object, you can test this by assigning it and checking attribute closed
like so:
from pathlib import Path
path = Path("file.txt")
# Open the file pointed by this path and return a file object, as
# the built-in open() function does.
f = path.open()
for line in f:
# do some stuff
print(f.closed) # Evaluates to False.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With