Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behavior of io:fread in Erlang

Tags:

erlang

This is an Erlang question.

I have run into some unexpected behavior by io:fread.

I was wondering if someone could check whether there is something wrong with the way I use io:fread or whether there is a bug in io:fread.

I have a text file which contains a "triangle of numbers"as follows:

59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
61 95 66 57 25 68
90 81 80 38 92 67 73
30 28 51 76 81 18 75 44
...

There is a single space between each pair of numbers and each line ends with a carriage-return new-line pair.

I use the following Erlang program to read this file into a list.

-module(euler67).
-author('Cayle Spandon').

-export([solve/0]).

solve() ->
    {ok, File} = file:open("triangle.txt", [read]),
    Data = read_file(File),
    ok = file:close(File),
    Data.

read_file(File) ->
    read_file(File, []).

read_file(File, Data) ->
    case io:fread(File, "", "~d") of
        {ok, [N]} -> 
            read_file(File, [N | Data]);
        eof ->
            lists:reverse(Data)
    end.

The output of this program is:

([email protected])30> euler67:solve().
[59,73,41,52,40,9,26,53,6,3410,51,87,86,8161,95,66,57,25,
 6890,81,80,38,92,67,7330,28,51,76,81|...]

Note how the last number of the fourth line (34) and the first number of the fifth line (10) have been merged into a single number 3410.

When I dump the text file using "od" there is nothing special about those lines; they end with cr-nl just like any other line:

> od -t a triangle.txt
0000000    5   9  cr  nl   7   3  sp   4   1  cr  nl   5   2  sp   4   0
0000020   sp   0   9  cr  nl   2   6  sp   5   3  sp   0   6  sp   3   4
0000040   cr  nl   1   0  sp   5   1  sp   8   7  sp   8   6  sp   8   1
0000060   cr  nl   6   1  sp   9   5  sp   6   6  sp   5   7  sp   2   5
0000100   sp   6   8  cr  nl   9   0  sp   8   1  sp   8   0  sp   3   8
0000120   sp   9   2  sp   6   7  sp   7   3  cr  nl   3   0  sp   2   8
0000140   sp   5   1  sp   7   6  sp   8   1  sp   1   8  sp   7   5  sp
0000160    4   4  cr  nl   8   4  sp   1   4  sp   9   5  sp   8   7  sp

One interesting observation is that some of the numbers for which the problem occurs happen to be on 16-byte boundary in the text file (but not all, for example 6890).

like image 772
Bruno Rijsman Avatar asked Jan 23 '09 15:01

Bruno Rijsman


2 Answers

I'm going to go with it being a bug in Erlang, too, and a weird one. Changing the format string to "~2s" gives equally weird results:

["59","73","4","15","2","40","0","92","6","53","0","6","34",
 "10","5","1","87","8","6","81","61","9","5","66","5","7",
 "25","6",
 [...]|...]

So it appears that it's counting a newline character as a regular character for the purposes of counting, but not when it comes to producing the output. Loopy as all hell.

A week of Erlang programming, and I'm already delving into the source. That might be a new record for me...

EDIT

A bit more investigation has confirmed for me that this is a bug. Calling one of the internal methods that's used in fread:

> io_lib_fread:fread([], "12 13\n14 15 16\n17 18 19 20\n", "~d").           
{done,{ok,"\f"}," 1314 15 16\n17 18 19 20\n"}

Basically, if there's multiple values to be read, then a newline, the first newline gets eaten in the "still to be read" part of the string. Other testing suggests that if you prepend a space it's OK, and if you lead the string with a newline it asks for more.

I'm going to get to the bottom of this, gosh-darn-it... (grin) There's not that much code to go through, and not much of it deals specifically with newlines, so it shouldn't take too long to narrow it down and fix it.

EDIT^2

HA HA! Got the little blighter.

Here's the patch to the stdlib that you want (remember to recompile and drop the new beam file over the top of the old one):

--- ../erlang/erlang-12.b.3-dfsg/lib/stdlib/src/io_lib_fread.erl
+++ ./io_lib_fread.erl
@@ -35,9 +35,9 @@
     fread_collect(MoreChars, [], Rest, RestFormat, N, Inputs).

 fread_collect([$\r|More], Stack, Rest, RestFormat, N, Inputs) ->
-    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\r|More]);
 fread_collect([$\n|More], Stack, Rest, RestFormat, N, Inputs) ->
-    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, More);
+    fread(RestFormat, Rest ++ reverse(Stack), N, Inputs, [$\n|More]);
 fread_collect([C|More], Stack, Rest, RestFormat, N, Inputs) ->
     fread_collect(More, [C|Stack], Rest, RestFormat, N, Inputs);
 fread_collect([], Stack, Rest, RestFormat, N, Inputs) ->
@@ -55,8 +55,8 @@
                eof ->
                    fread(RestFormat,eof,N,Inputs,eof);
                _ ->
-                   %% Don't forget to count the newline.
-                   {more,{More,RestFormat,N+1,Inputs}}
+                   %% Don't forget to strip and count the newline.
+                   {more,{tl(More),RestFormat,N+1,Inputs}}
            end;
        Other ->                                %An error has occurred
            {done,Other,More}

Now to submit my patch to erlang-patches, and reap the resulting fame and glory...

like image 193
womble Avatar answered Oct 14 '22 22:10

womble


Besides the fact that it seems to be a bug in one of the erlang libs I think you could (very) easily circumvent the problem.

Given the fact your file is line-oriented I think best practice is that you process it line-by-line as well.

Consider the following construction. It works nicely on an unpatched erlang and because it uses lazy evaluation it can handle files of arbitrary length without having to read all of it into memory first. The module contains an example of a function to apply to each line - turning a line of text-representations of integers into a list of integers.


-module(liner).
-author("Harro Verkouter").
-export([liner/2, integerize/0, lazyfile/1]).

% Applies a function to all lines of the file
% before reducing (foldl).
liner(File, Fun) ->
    lists:foldl(fun(X, Acc) -> Acc++Fun(X) end, [], lazyfile(File)).

% Reads the lines of a file in a lazy fashion
lazyfile(File) ->
    {ok, Fd} = file:open(File, [read]),
    lazylines(Fd).
% Actually, this one does the lazy read ;)
lazylines(Fd) ->
    case io:get_line(Fd, "") of
        eof -> file:close(Fd), [];
        {error, Reason} ->
            file:close(Fd), exit(Reason);
        L ->
            [L|lazylines(Fd)]
    end.

% Take a line of space separated integers (string) and transform
% them into a list of integers
integerize() ->
    fun(X) ->
        lists:map(fun(Y) -> list_to_integer(Y) end,
                string:tokens(X, " \n")) end.


Example usage:
Eshell V5.6.5  (abort with ^G)
1> c(liner).
{ok,liner}
2> liner:liner("triangle.txt", liner:integerize()).
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
 68,90,81,80,38,92,67,73,30|...]

And as a bonus, you can easily fold over the lines of any (lineoriented) file w/o running out of memory :)

6> lists:foldl( fun(X, Acc) -> 
6>                  io:format("~.2w: ~s", [Acc,X]), Acc+1
6>                  end,
6>              1,  
6>              liner:lazyfile("triangle.txt")).                                        
 1: 59
 2: 73 41
 3: 52 40 09
 4: 26 53 06 34
 5: 10 51 87 86 81
 6: 61 95 66 57 25 68
 7: 90 81 80 38 92 67 73
 8: 30 28 51 76 81 18 75 44

Cheers, h.

like image 2
haavee Avatar answered Oct 14 '22 23:10

haavee