Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To split at whitespace in Perl

Tags:

perl

I have a data of type -

500 3.6673656 
----------
1000 3.2707536
----------
1500 3.2356145
----------
2000 3.0495141
----------
2500 3.016674

i.e. time and the distance. I need to split time in one array and distance in an other. by using my @line = split( /\s+/, $_); i could store the distance in one array, but can't store time. Is there any other way to separately store each of them in different array?

input is taken from a file and contents are stored in @array.

my script:

    foreach $_ (@array){ 
    if($_ =~ /[@]/) {# do nothing, it's a comment or formatting line} 
    else  {my @line = split( /\s+/, $_);
    print "@line\n";}
}
like image 859
Shivani Kapadia Avatar asked Dec 01 '22 00:12

Shivani Kapadia


1 Answers

Let's get all fancy pants:

#! /usr/bin/env perl
#

use strict;
use warnings;
use feature qw(say);

my @array;
while ( my $line = <DATA> ) {
    chomp $line;
    push @array, $line;
}

#
# As two separate arrays (Not so good)
#
my @times;
my @distances;
for my $entry ( @array ) {
    chomp $entry;               # Not needed, but never hurts
    next if $entry =~ /--+$/;   # Next if all dashes
    my ( $distance, $time ) = split /\s+/, $entry;
    push @times, $time;
    push @distances, $distance;
}
say "The first entry as two distinct arrays";
say "Distance: $distances[0]";
say "Time: $times[0]";

#
# As two entries in a single array
#
my @velocities;
for my $entry ( @array ) {
    chomp $entry;               # Not needed, but never hurts
    next if $entry =~ /--+$/;   # Next if all dashes
    my @velocity = split /\s+/, $entry;
    push @velocities, \@velocity;
}
say "The first entry as an array of arrays";
say "Distance: " . $velocities[0]->[0];
say "Time: " . $velocities[0]->[1];
#
# As a hash in an array (Better Still)
# Note: Using regular expression to split
#
my @velocities2;
for my $entry ( @array ) {
    chomp $entry;               # Not needed, but never hurts
    next unless $entry =~ /\s*(\S+)\s+(\S+)/;
    my %velocity;
    $velocity{DISTANCE} = $1;
    $velocity{TIME} = $2;
    push @velocities2, \%velocity;
}
say "The first entry as an array of hashes";
say "Distance: " . $velocities2[0]->{DISTANCE};
say "Time: " . $velocities2[0]->{TIME};
#
# As objects (The best!)
#
my @velocities3;
for my $entry ( @array ) {
    chomp $entry;               # Not needed, but never hurts
    next unless $entry =~ /\s*(\S+)\s+(\S+)/;
    my $distance = $1;
    my $time = $2;
    my $velocity = Local::Velocity->new( $distance, $time );
    push @velocities3, $velocity;
}
say "The first entry as an object";
say "Distance: " . $velocities3[0]->distance;
say "Time: " . $velocities3[0]->time;

package Local::Velocity;

sub new {
    my $class    = shift;
    my $distance = shift;
    my $time     = shift;

    my $self = {};
    bless $self, $class;
    $self->distance( $distance );
    $self->time( $time );
    return $self;
}

sub distance {
    my $self     = shift;
    my $distance = shift;

    if ( defined $distance ) {
        $self->{DISTANCE} = $distance;
    }
    return $self->{DISTANCE};
}

sub time {
    my $self    = shift;
    my $time    = shift;

    if ( defined $time ) {
        $self->{TIME} = $time;
    }
    return $self->{TIME};
}

package main;
__DATA__
500 3.6673656 
----------
1000 3.2707536
----------
1500 3.2356145
----------
2000 3.0495141
----------
2500 3.016674

The first way is what you asked: Two parallel arrays. The problem with this method is that you are now forced to keep two separate data structures in order. If you pass a time and distance, you have to pass two separate data elements. If you modify one, you have to modify the other. If you push or pop from one, you have to do it to the other.

Not too bad with just two, but imagine having to do this with a dozen or more.

The second way uses References. References allow you to do more complex data structures. This keeps the two entries in a single array together. Now, you have one array that contains both entries. push one, and you push the other. pop one, and you pop the other. If you pass your time and distance to a subroutine, you only need to pass a single entry.

The third way takes the concept of references up a notch. Instead of using an array to store your two values, you use a hash. The advantage is that each element in the hash has a name. Is the first entry or second entry the distance? It doesn't matter, it's the entry labeled DISTANCE. Same advantages with an array or arrays, but now, you labeled which is which. Imagine a person with names, phones, addresses, etc., and you can see the advantage.

The final way is using objects. Which as you can see are very similar to using hashes. You don't have a hash or array. You have a Local::Velocity object that contains a time and distance.

It seems a bit more complex, but objects have a lot of advantages:

  • There's no issue whether an entry is DISTANCE, Distance, or distance, and there's no issue of misspelling distanse. You have a method called distance. Mess up the name, and your program dutifully crashes instead of continuing with bad data.
  • You can modify your object without affecting your program. For example, maybe a subroutine called velocity that takes your object and returns the velocity. Or maybe you might want to add a direction to your velocity. Modifying the object won't affect your program.

Object oriented Perl allows you to create extremely complex data types without having to remember how you structured them. It's why most new modules are object oriented.

like image 72
David W. Avatar answered Dec 21 '22 23:12

David W.