Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Perl, can I limit the length of a line as I read it in from a file (like fgets)?

Tags:

perl

fgets

I'm trying to write a piece of code that reads a file line by line and stores each line, up to a certain amount of input data. I want to guard against the end-user being evil and putting something like a gig of data on one line in addition to guarding against sucking in an abnormally large file. Doing $str = <FILE> will still read in a whole line, and that could be very long and blow up my memory.

fgets lets me do this by letting me specify a number of bytes to read during each call and essentially letting me split one long line into my max length. Is there a similar way to do this in perl? I saw something about sv_gets but am not sure how to use it (though I only did a cursory Google search).

The goal of this exercise is to avoid having to do additional parsing / buffering after reading data. fgets stops after N bytes or when a newline is reached.

EDIT I think I confused some. I want to read X lines, each with max length Y. I don't want to read more than Z bytes total, and I would prefer not to read all Z bytes at once. I guess I could just do that and split the lines, but wondering if there's some other way. If that's the best way, then using the read function and doing manual parse is my easiest bet.

Thanks.

like image 512
NG. Avatar asked May 28 '10 15:05

NG.


2 Answers

As an exercise, I've implemented a wrapper around C's fgets() function. It falls back to a Perl implementation for complicated filehandles defined as "anything without a fileno" to cover tied handles and whatnot. File::fgets is on its way to CPAN now, you can pull a copy from the repository.

Some basic benchmarking shows its over 10x faster than any of the implementations here. However, I cannot say its bug free or doesn't leak memory, my XS skills are not that great, but its better tested than anything here.

like image 166
Schwern Avatar answered Nov 14 '22 17:11

Schwern


sub heres_what_id_do($$) {
    my ($fh, $len) = @_;
    my $buf = '';

    for (my $i = 0; $i < $len; ++$i) {
        my $ch = getc $fh;
        last if !defined $ch || $ch eq "\n";
        $buf .= $ch;
    }

    return $buf;
}

Not very "Perlish" but who cares? :) The OS (and possibly Perl itself) will do all the necessary buffering underneath.

like image 4
j_random_hacker Avatar answered Nov 14 '22 18:11

j_random_hacker