Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rebol 3: reading STDIN efficiently line by line (to make awk like tool)

I am trying to make an awk like tool that uses Rebol 3 to process bigger text files with bash pipes and tools. I am having a problem reading STDIN line by line in Rebol 3?

For example this shell command produces 3 lines:

$ (echo "first line" ; echo "second line" ; echo "third line" )
first line
second line
third line

But the Rebol's input word reads all 3 lines at the same time. I would expect it to stop at newline as it stops if you use input interactively.

r3 --do 'while [ x: input ] [ if empty? x [ break ] print x print   "***" ]' 
abcdef
abcdef
***
blabla
blabla
***

But when I run it all together it reads whole input at once. I could read it all at once and split into lines, but I want it to work in a "streaming" manner as I usually cat in many 1000-s of lines.

$ (echo "first line" ; echo "second line" ; echo "third line" )  \
  | r3 --do 'while [ x: input ] [ if empty? x [ break ] print x print "***" ]' 
first linesecond linethird line
***

I also looked at source of input to make a similar function. I could read in character per character in a while loop and check for newlines but that doesn't seem efficient.

like image 265
middayc Avatar asked Jan 27 '17 07:01

middayc


2 Answers

I figured it out and it seems to work well even on big, 10000 lines files. It's could be written more elegantly and improved though.

The function r3awk takes STDIN and a block of code that it executes per line, binding line variable to it:

r3awk: func [ code /local a lines line partial ] [ 
    partial: copy ""
    lines: read/lines/string system/ports/input
    while [ not empty? lines ] [
        lines/1: rejoin [ partial lines/1 ]
        partial: pull lines
        foreach line lines [
            do bind code 'line
        ] 
        if error? try [ lines: read/lines/string system/ports/input ] [ lines: copy [] ]
    ]
    line: partial
    do bind code 'line
]    

It works like this. read/lines reads a number of characters from the stream and returns a block of lines. Each time it's called it reads a next batch of characters like this, so it's all wrapped in a while loop. The code processes (do-es the code block) as the while loops (not at the end).

The batch of characters doesn't end on newline so last line is partial each time. And so is the first line in the next batch, hence it joins them together. At the end it has to process the last (this time non-partial) line. Try is there because some lines caused utf encoding errors.

It can be used like this in command line:

(echo "first line" ; echo "second line" ; echo "third line" ) | \
 r3 --import utils.r --do 'r3awk [ parse line [ copy x to space (print x) ] ]'
first
second
third

Things to improve: make function generally better, deduplicate some code. Check what happens if the read/lines does end exactly on the newline.

like image 93
middayc Avatar answered Oct 03 '22 18:10

middayc


I came across same problem with input a couple of years ago. I don't think this is a planned change but rather an incomplete implementation (touch wood!).

Here is a workaround function i wrote at that time (which has worked fine for me on MacOS & Linux).

input-line: function [
    {Return next line (string!) from STDIN.  Returns NONE when nothing left}
    /part size [integer!] "Internal read/part (buffer) size"
  ][
    buffer: {}    ;; static
    if none? part [size: 1024]

    forever [
        if f: find buffer newline [
            remove f    ;; chomp newline (NB. doesn't cover Windows CRLF?)
            break
        ]

        if empty? data: read/part system/ports/input size [
            f: length? buffer
            break
        ]

        append buffer to-string data
    ]

    unless all [empty? data empty? buffer] [take/part buffer f]
]

Usage example:

while [not none? line: input-line] [
    ;; do something with LINE of data from STDIN
]
like image 32
draegtun Avatar answered Oct 03 '22 17:10

draegtun