I want to use the Haskell function
readFile :: FilePath -> IO String
to read the content of a file into a string. In the documentation I have read that "The file is read lazily, on demand, as with getContents."
I am not sure I understand this completely. For example, suppose that I write
s <- readFile "t.txt"
When this action is executed:
length s
all the content of the file will be read and the file will be closed).readFile
is closed (automatically).Is my third statement correct? So, can I just invoke readFile
without closing the file handle myself? Will the handle stay open as long as I have not consumed (visited) the whole result string?
EDIT
Here is some more information regarding my doubts. Suppose I have the following:
foo :: String -> IO String
foo filename = do
s <- readFile "t.txt"
putStrLn "File has been read."
return s
When the putStrLn
is executed, I would (intuitively) expect that
s
contains the whole content of file t.txt
,If this is not the case:
s
contain when putStrLn
is executed?putStrLn
is executed?putStrLn
is executed s
does not contain the whole content of the file, when will this content actually be read, and when will the file be closed?Is my third statement correct?
Not quite, the file is not closed "As soon as the last character has been read", at least not usually, it lingers in the semi-closed state it was in during the read for a few moments, the IO-manager/runtime will close it when it next performs such actions. If you're rapidly opening and reading files, that lag may cause you to run out of file handles if the OS limit isn't too high.
For most use cases (in my limited experience), however, the closing of the file handle is timely enough. [There are people who disagree and view lazy IO as extremely dangerous in all cases. It definitely has pitfalls, but IMO its dangers are often overstated.]
So, can I just invoke
readFile
without closing the file handle myself?
Yes, when you're using readFile
, the file handle is closed automatically when the file contents has been entirely read or when it is noticed that the file handle is not referenced anymore.
Will the handle stay open as long as I have not consumed (visited) the whole result string?
Not quite, readFile
puts the file handle in a semi-closed state, described in the docs for hGetContents
:
Computation
hGetContents hdl
returns the list of characters corresponding to the unread portion of the channel or file managed byhdl
, which is put into an intermediate state, semi-closed. In this state,hdl
is effectively closed, but items are read fromhdl
on demand and accumulated in a special list returned byhGetContents hdl.
foo :: String -> IO String
foo filename = do
s <- readFile "t.txt"
putStrLn "File has been read."
return s
Ah, that's one of the pitfalls of lazy IO on the other end. Here the file is closed before its contents have been read. When foo
returns, the file handle isn't referenced anymore, and then closed. The consumer of foo
s result will then find that s
is an empty string, because when hGetContents
tries to actually read from the file, the handle is already closed.
I confused the behaviour of readFile
with that of
bracket (openFile file ReadMode) hClose hGetContents
there. readFile
only closes the file handle after s
is not referenced anymore, so it behaves correctly as expected here.
When the
putStrLn
is executed, I would (intuitively) expect that
s
contains the whole content of filet.txt
,- The handle used to read the file has been closed.
No, s
does not contain anything yet but a recipe to maybe get some characters from the file handle. The file handle is semi-closed, but not closed. It will be closed when the file contents has been entirely read, or s
goes out of scope.
If this is not the case:
- What does
s
contain whenputStrLn
is executed?- In what state is the file handle when
putStrLn
is executed?- If when
putStrLn
is executeds
does not contain the whole content of the file, when will this content actually be read, and when will the file be closed?
The first two questions have been answered, the answer to the third is "the file will be read when the contents is consumed", and it will be closed when the entire contents has been read or when it is no longer referenced.
That would be different with the above bracket
invocation - bracket
guarantees that the final operation, here the hClose
will be run even if the other actions throw an exception, therefore its use is often recommended. However, the hClose
is run when bracket
returns, and then the hGetContents
can't get any contents from the now really closed file handle. But readFile
would not necessarily close the file handle if an exception occurs.
That is one of the dangers or quirks of lazy IO, files are not read until their contents is demanded, and if you use lazy IO wrongly, that will be too late and you don't get any contents.
It's a trap many (or even most) fall into one time or another, but after having been bitten by it, one quickly learns when IO needs to be non-lazy and do it non-lazily in those cases.
The alternatives (iteratees, enumerators, conduits, pipes, ...) avoid those traps [unless the implementer made a mistake], but are considerably less nice to use in those cases where lazy IO is perfectly fine. On the other hand, they treat the cases where laziness is not desired much better.
When the putStrLn is executed, I would (intuitively) expect that
s
contains the whole content of file t.txt,
You need to think about the fact you're using lazy IO here. Reading from the file merely creates an unevalauted string computation that, if it is later required, will then read the file.
By using lazy IO you defer your IO until the value is needed.
Once the last character of your file has been read, or all references to the open file are dropped (e.g. your s
value), your open file will be closed by the garbage collector.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With