Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang: Read from an input stream in a efficient way

I'm writing a program that reads from an input stream, i.e.

erl -run p main -noshell -s erlang halt < input

The problem is that it takes a lot of time to read it (the input stream is huge) using this read function:

read_input(L) ->
    case io:get_line("") of
        eof ->
            lists:reverse(L);
        E0 ->
            read_input([E0|L])
    end.

I have been looking for more efficient alternatives, but I have found nothing. I have tried to read the file using

{ok, Binary} = file:read_file("input")

This is by far much more efficient. The problem is that I have to run this program in a platform where the name is unknown so I'd need some alternative to do so. additionally, I can't select the flags used when running, e.g. flag -noinput cannot be added to the command line.

Whatever help you can give will be welcomed.

like image 576
Salvador Tamarit Avatar asked May 07 '16 15:05

Salvador Tamarit


2 Answers

You can use open_port/2 to open stdin and read binaries from it. For example:

-module(p).
-export([start/0]).

start() ->
    process_flag(trap_exit, true),
    P = open_port({fd,0,1}, [in, binary]),
    Bin = read(P,<<>>),
    io:format("received ~p\n", [Bin]),
    halt(0).

read(P, Bin) ->
    receive
        {P, {data, Data}} ->
            read(P, <<Bin/binary, Data/binary>>);
        {'EXIT',P,_} ->
            Bin
    end.

The code has to trap exits so it knows to exit its reading loop when the port closes. This example reads everything into a single binary returned from the read/2 function and then prints it out and exits, but obviously you can perform further operations on the binary in your actual application.

You can run this like this:

erl -noinput -s p < input
like image 168
Steve Vinoski Avatar answered Nov 10 '22 02:11

Steve Vinoski


Although Steve's solution is fastest known to me solution there can be used file module solution with quite good performance:

-module(p).

-export([start/0]).

-define(BLK_SIZE, 16384).

start() ->
    do(),
    halt().

do() ->
    Bin = read(),
    io:format("~p~n", [byte_size(Bin)]).

read() ->
    ok = io:setopts(standard_io, [binary]),
    read(<<>>).

read(Acc) ->
    case file:read(standard_io, ?BLK_SIZE) of
        {ok, Data} ->
            read(<<Acc/bytes, Data/bytes>>);
        eof ->
            Acc
    end.

It works with invocation like:

erl -noshell -s p < input

Note both approaches could be used for line-oriented input using {line, Max_Line_Size} option for port or file:read_line/1 for file module solution. Since version 17 (if I recall correctly) there is fixed performance bug in file:read_line/1 I found so it is good now. Anyway, you should not expect performance and comfort of Perl.

like image 20
Hynek -Pichi- Vychodil Avatar answered Nov 10 '22 02:11

Hynek -Pichi- Vychodil