Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

functional Perl: Filter, Iterator

I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like

# comment line - ignore

# ignore also empty lines
key1 = value
key2 = value1, value2, value3

I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:

  • use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines
  • The mentioned filter(s) should be defined outside of the function for reusability by other functions.
  • use another function that is given the line and returns a tuple of key and values string
  • use another function that breaks the comma separated values into a list of values.

What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.

For reference, here is (a quick hack) how I might do it in Python:

re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
                                   and re_key_values.match(line)

def parse(lines):
    interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
    key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
    splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
    return dict(splitted_values)
like image 441
Thomas Koch Avatar asked Apr 25 '26 00:04

Thomas Koch


2 Answers

A direct translation of your Python would be

my $re_is_comment_line = qr/^\s*#/;
my $re_key_values      = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter        = qr/\s*,\s*/;
my $is_interesting_line= sub {
  my $_ = shift;
  length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};

sub parse {
  my @lines = @_;
  my @interesting_lines = grep $is_interesting_line->($_), @lines;
  my @key_values = map [/$re_key_values/], @interesting_lines;
  my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } @key_values;
  return %splitted_values;
}

Differences are:

  • ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
  • Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.

In the next example, the following will be added:

  • Regexes don't have to be precompiled, Perl is very good with regex optimizations.
  • Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
  • The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).

.

sub parse {
  my %splitted_values =
    map { $_->[0], [split /\s*,\s*/, $_->[1]] }
    map {[split /\s*=\s*/, $_, 2]}
    grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
    @_;
  return \%splitted_values; # returning a reference improves efficiency
}

But I think a more elegant solution here is to use a traditional loop:

sub parse {
  my %splitted_values;
  LINE: for (@_) {
    next LINE if !length or /^\s*#/;
    s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
    my ($key, $vals) = split /\s*=\s*/, $_, 2;
    defined $vals or next LINE; # check if $vals was assigned
    @{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
  }
  return \%splitted_values
}

If we decide to pass a filehandle instead, the loop would be replaced with

my $fh = shift;
LOOP: while (<$fh>) {
  chomp;
  ...;
}

which would use an actual iterator.

You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(@args) syntax.

use Carp; # Error handling for writing APIs
sub parse {
  my $args = shift;
  my $interesting  = $args->{interesting}   or croak qq("interesting" callback required);
  my $kv_splitter  = $args->{kv_splitter}   or croak qq("kv_splitter" callback required);
  my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default

  my %splitted_values;
  LINE: for (@_) {
    next LINE unless $interesting->($_);
    s/\A\s*|\s*\z//g;
    my ($key, $vals) = $kv_splitter->($_);
    defined $vals or next LINE;
    $splitted_values{$key} = $val_transform->($vals);
  }
  return \%splitted_values;
}

This could then be called like

my $data = parse {
  interesting   => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
  kv_splitter   => sub { split /\s*=\s*/, $_[0], 2 },
  val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, @lines;
like image 179
amon Avatar answered Apr 26 '26 13:04

amon


I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:

use strict;
use warnings;
use Config::Properties;

my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
like image 36
Miguel Prz Avatar answered Apr 26 '26 12:04

Miguel Prz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!