Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do Haskell files close automatically after readFile?

I want to use the Haskell function

readFile :: FilePath -> IO String

to read the content of a file into a string. In the documentation I have read that "The file is read lazily, on demand, as with getContents."

I am not sure I understand this completely. For example, suppose that I write

s <- readFile "t.txt"

When this action is executed:

  • The file is opened.
  • The characters in s are actually read from the file as soon as (but not sooner) they are needed to evaluate some expression (e.g. if I evaluate length s all the content of the file will be read and the file will be closed).
  • As soon as the last character has been read, the file handle associated to this call to readFile is closed (automatically).

Is my third statement correct? So, can I just invoke readFile without closing the file handle myself? Will the handle stay open as long as I have not consumed (visited) the whole result string?

EDIT

Here is some more information regarding my doubts. Suppose I have the following:

foo :: String -> IO String
foo filename = do
                  s <- readFile "t.txt"
                  putStrLn "File has been read."
                  return s

When the putStrLn is executed, I would (intuitively) expect that

  1. s contains the whole content of file t.txt,
  2. The handle used to read the file has been closed.

If this is not the case:

  • What does s contain when putStrLn is executed?
  • In what state is the file handle when putStrLn is executed?
  • If when putStrLn is executed s does not contain the whole content of the file, when will this content actually be read, and when will the file be closed?
like image 910
Giorgio Avatar asked Dec 13 '12 21:12

Giorgio


2 Answers

Is my third statement correct?

Not quite, the file is not closed "As soon as the last character has been read", at least not usually, it lingers in the semi-closed state it was in during the read for a few moments, the IO-manager/runtime will close it when it next performs such actions. If you're rapidly opening and reading files, that lag may cause you to run out of file handles if the OS limit isn't too high.

For most use cases (in my limited experience), however, the closing of the file handle is timely enough. [There are people who disagree and view lazy IO as extremely dangerous in all cases. It definitely has pitfalls, but IMO its dangers are often overstated.]

So, can I just invoke readFile without closing the file handle myself?

Yes, when you're using readFile, the file handle is closed automatically when the file contents has been entirely read or when it is noticed that the file handle is not referenced anymore.

Will the handle stay open as long as I have not consumed (visited) the whole result string?

Not quite, readFile puts the file handle in a semi-closed state, described in the docs for hGetContents:

Computation hGetContents hdl returns the list of characters corresponding to the unread portion of the channel or file managed by hdl, which is put into an intermediate state, semi-closed. In this state, hdl is effectively closed, but items are read from hdl on demand and accumulated in a special list returned by hGetContents hdl.


foo :: String -> IO String
foo filename = do
              s <- readFile "t.txt"
              putStrLn "File has been read."
              return s

Ah, that's one of the pitfalls of lazy IO on the other end. Here the file is closed before its contents have been read. When foo returns, the file handle isn't referenced anymore, and then closed. The consumer of foos result will then find that s is an empty string, because when hGetContents tries to actually read from the file, the handle is already closed.

I confused the behaviour of readFile with that of

bracket (openFile file ReadMode) hClose hGetContents

there. readFile only closes the file handle after s is not referenced anymore, so it behaves correctly as expected here.

When the putStrLn is executed, I would (intuitively) expect that

  1. s contains the whole content of file t.txt,
  2. The handle used to read the file has been closed.

No, s does not contain anything yet but a recipe to maybe get some characters from the file handle. The file handle is semi-closed, but not closed. It will be closed when the file contents has been entirely read, or s goes out of scope.

If this is not the case:

  • What does s contain when putStrLn is executed?
  • In what state is the file handle when putStrLn is executed?
  • If when putStrLn is executed s does not contain the whole content of the file, when will this content actually be read, and when will the file be closed?

The first two questions have been answered, the answer to the third is "the file will be read when the contents is consumed", and it will be closed when the entire contents has been read or when it is no longer referenced.

That would be different with the above bracket invocation - bracket guarantees that the final operation, here the hClose will be run even if the other actions throw an exception, therefore its use is often recommended. However, the hClose is run when bracket returns, and then the hGetContents can't get any contents from the now really closed file handle. But readFile would not necessarily close the file handle if an exception occurs.

That is one of the dangers or quirks of lazy IO, files are not read until their contents is demanded, and if you use lazy IO wrongly, that will be too late and you don't get any contents.

It's a trap many (or even most) fall into one time or another, but after having been bitten by it, one quickly learns when IO needs to be non-lazy and do it non-lazily in those cases.

The alternatives (iteratees, enumerators, conduits, pipes, ...) avoid those traps [unless the implementer made a mistake], but are considerably less nice to use in those cases where lazy IO is perfectly fine. On the other hand, they treat the cases where laziness is not desired much better.

like image 154
Daniel Fischer Avatar answered Oct 25 '22 22:10

Daniel Fischer


When the putStrLn is executed, I would (intuitively) expect that s contains the whole content of file t.txt,

You need to think about the fact you're using lazy IO here. Reading from the file merely creates an unevalauted string computation that, if it is later required, will then read the file.

By using lazy IO you defer your IO until the value is needed.

Once the last character of your file has been read, or all references to the open file are dropped (e.g. your s value), your open file will be closed by the garbage collector.

like image 26
Don Stewart Avatar answered Oct 25 '22 20:10

Don Stewart