Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a good CPAN module to implement state machines when parsing text?

When parsing text, I frequently need to implement mini-state-machines, in the generic form following the code below.

Is there a CPAN module that's considered "best practice" and well suited to implement state machine logic like this in an easy and elegant way?

I would prefer solutions less complicated than Parse::RecDescent but if none exist and Parse::RecDescent is a lot easier to apply to this problem than I thought, I'm very willing to consider it instead of rolling my own like I've been so far.

Example generic parsing code:

my $state = 1;
while (my $token = get_next_token()) { # Usually next line
    if ($state == 1) {
        do_state1_processing();
        if (token_matches_transition_1_to_2($token)) {
            do_state_1_to_2_transition_processing();
            $state == 2;
            next;
        } elsif (token_matches_transition_1_to_4($token)) {
            do_state_1_to_4_transition_processing();
            $state == 4;
            next;
        } else {
             do_state1_continuation();
             next;
        }
    } elsif ($state == 5) {
        do_state5_processing();
        if (token_matches_transition_5_to_6($token)) {
            do_state_5_to_6_transition_processing();
            $state == 6;
            next;
        } elsif (token_matches_transition_5_to_4($token)) {
            do_state_5_to_4_transition_processing();
            $state == 4;
            next;
        } else {
             do_state5_continuation();
             next;
        }
    } else {

    }

}
like image 858
DVK Avatar asked Feb 20 '12 11:02

DVK


3 Answers

I would recommend taking a look at Marpa and Marpa::XS.

Just look at this simple calculator.

my $grammar = Marpa::XS::Grammar->new(
    {   start   => 'Expression',
        actions => 'My_Actions',
        default_action => 'first_arg',
        rules   => [
            { lhs => 'Expression', rhs => [qw'Term'] },
            { lhs => 'Term', rhs => [qw'Factor'] },
            { lhs => 'Factor', rhs => [qw'Number'] },
            { lhs => 'Term', rhs => [qw'Term Add Term'], action => 'do_add' },
            {   lhs    => 'Factor',
                rhs    => [qw'Factor Multiply Factor'],
                action => 'do_multiply'
            },
        ],
    }
);

You will have to implement the tokenizer yourself.

like image 65
Brad Gilbert Avatar answered Nov 10 '22 11:11

Brad Gilbert


You can use Class::StateMachine:

package Foo;
use parent 'Class::StateMachine';

sub new {
    my $class = shift;
    Class::StateMachine::bless {}, $class, 'state_1';
}

sub do_state_processing :OnState('state_1') {
  my $self = shift;
  if    (...) { $self->event_1 }
  elsif (...) { $self->event_2 }
  ...
}

sub do_state_processing :OnState('state_2') {
  ...
}

sub event_1 :OnState('state_1') {
  my $self = shift;
  $self->state('state_2');
}

sub event_2 :OnState('state_2') {
  my $self = shift;
  $self->state('state_3');
}

sub enter_state :OnState('state_1') {
  print "entering state 1";
  ...
}

sub enter_state :OnState('state_2') {
  ...
}

package main;

my $sm = Foo->new;
...
while (my $token = get_next_token()) {
  $sm->do_state_processing;
}

Though, a module specific to text processing will probably be more appropriate for your particular case

like image 20
salva Avatar answered Nov 10 '22 11:11

salva


(With help) I wrote something a few years back called the Perl Formal Language Toolkit so this might serve as some sort of basis, however what I think you really want is a tool like the Ragel Finite State Machine Compiler. Unfortunately, it doesn't output to Perl, and it's a back- burner desire of mine to implement a Perl target for Ragel and to also provide similar (but more Perl oriented) features for my bit-rotting module.

like image 29
B. Estrade Avatar answered Nov 10 '22 10:11

B. Estrade