Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested while loop to calculate distance for multiple destinations

Oke I'm messing up pretty bad now. I wanted to create a proper script using stricts and warnings (still a challenge for me;). But now I'm completely lost. I've been looking at so many examples that I'm totally confused. What I'm trying is to calculate the distance between 2 points using lat/lon. I think I have that part covered with gis::distance. BUT the problem is I'm trying to find destinations that are within 5000m of each other. (and skip if the destinations are the same). So when it finds a destination that's within 5000m of another destination I want it to place it after the last element in the first file.

Both input files are the same, here's how they look. Both files have about 45k lines.

Europe;3;France;23;Parijs;42545;48,856555;2,350976
Europe;3;France;23;Parisot;84459;44,264381;1,857827
Europe;3;France;23;Parlan;11337;44,828976;2,172435
Europe;3;France;23;Parnac;35670;46,4533;1,4425
Europe;3;France;23;Parnans;22065;45,1097;5,1456

Let's say that 2 of these destinations are near each other, I'm trying to output it like so:

Europe;3;France;23;Parijs;42545;48,856555;2,350976;Parlan;11337;200
Europe;3;France;23;Parisot;84459;44,264381;1,857827;
Europe;3;France;23;Parlan;11337;44,828976;2,172435;
Europe;3;France;23;Parnac;35670;46,4533;1,4425;Parisot;84459;2000;Parnans;22065;350
Europe;3;France;23;Parnans;22065;45,1097;5,1456;

The actual result will match more than 2 of course. In the outputfile the matched destination, destination id and calculated distance are added. It's possible that there are multiple matches for each destination. Pff this is really hard to explain haha. As i said I'm using strict&warnings and narrowed the errors down to a minimum but still not completely. These are the errors:

Global symbol "$infile1" requires explicit package name at E:\etc.pl line 17.
Execution of E:\etc.pl aborted due to compilation errors.

This is the code I have so far. I'm not making heads or tails with the my declarations as well.

Can someone help me? (maybe this isn't the most effective way but for now it helps me understand perl more, step by step)

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);

my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!\n";

my $maxdist = 5000;
my $mindist = 0.0001;

while ( my @infile1 ){ 
    my @elements = split(";",$infile1);

    my $lat1 = $elements[6];
    my $lon1 = $elements[7];

    $lat1 =~ s/,/./g;
    $lon1 =~ s/,/./g;

    seek my $infile2, 0, 0;

    print "1. $lat1\n";
    print "2. $lon1\n";

    while ( my @infile2 ){
        my @loopelements = split(";",$infile2);

        my $lat2 = $loopelements[6];
        my $lon2 = $loopelements[7];

        $lat2 =~ s/,/./g;
        $lon2 =~ s/,/./g;

        print "3. $lat1\n";
        print "4. $lon1\n";

        my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

        print "5. $distance\n";

        my $afstand = sprintf("%.4f",$distance);

        print "6. $afstand\n";

        if (($afstand < $maxdist) and (!($elements[4] == $loopelements[4]))){ 
            push (@elements, $afstand,$loopelements[4],$loopelements[5]);
            print "7. $afstand\n";
            } else {
                next;
                }
        }

  @elements = join(";",@elements);  # add ';' to all elements
  print OUTFILE "@elements";
  #if ($i == 10) {last;}
  }
close(INFILE1);
close(INFILE2);
close(OUTFILE);

--------------- EDIT --------------

Well, here i am again. I've been looking at your updated code and this is a pretty intense version of mine haha. I honestly have to say i only understand half of it. It's still pretty helpfull; al of it! I decided to stick to my original script design with your improvements but it's still not working. I have a few questions about that if you don't mind:

I made af few adjustments in the script. The first one is that it now skips latlons with a zero because that would give a useless result. In the same line it also skips empty cells which is also useless. I have done this for both the infiles.

Oh and where i said elements[4] i ment elements[5], so it would be numbers. So i switched the ne for a != if i'm not mistaken. But i think i created an infinite loop again because it's not looping through the second file. I know i may seem stubborn but i wanted to first understand my original script and start working on your version as soon as i got mine running. The seek function isn't working properly i think. Here is the script as it is now.

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);

my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!\n";

my $maxdist = 3000;
my $mindist = 0.0001;

while (my $infile1 = <$INFILE1> ){
  chomp $infile1;
  my @elements = split(";",$infile1);

  print "1. $elements[6]\n";
  print "2. $elements[7]\n";

  my $lat1 = $elements[6];
  my $lon1 = $elements[7];

if ((($lat1 and $lon1) ne '0') and (!($lat1 and $lon1) eq "")){
        $lat1 =~ s/,/./;
        $lon1 =~ s/,/./;
        print "lat1: $lat1\n";
        print "lon1: $lon1\n";  
        } else {
            next;
            }

  print "3. $lat1\n";
  print "4. $lon1\n";

  seek $INFILE2, 0, 0;

  while ( my $infile2 = <$INFILE2> ){
    chomp $infile2;
    my @loopelements = split(";",$infile2);

print "5. $elements[6]\n";
print "6. $elements[7]\n";

    my $lat2 = $loopelements[6];
    my $lon2 = $loopelements[7];

if ((($lat2 and $lon2) ne '0') and (!($lat2 and $lon2) eq "")){
        $lat2 =~ s/,/./;
        $lon2 =~ s/,/./;
        print "lat2: $lat1\n";
        print "lon2: $lon1\n";  
        } else {
            next;
            }

my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

print "7. $distance\n";

my $afstand = sprintf("%.4f",$distance);

print "8. $afstand\n";

if ($afstand < $maxdist && $elements[4] != $loopelements[4]){ 
  push (@elements, $afstand, $loopelements[4],$loopelements[5]);
  print "9. $afstand\n";
    } else {
        next;
        }
  }
print $OUTFILE join(";",@elements), "\n";
}

close($INFILE1);
close($INFILE2);
close($OUTFILE);
like image 259
Jan Avatar asked Jul 24 '12 06:07

Jan


1 Answers

You've got it pretty good already. Let's take a look at your error messages.

Global symbol "$infile1" requires explicit package name at E:\etc.pl line 17.

This one is simple. In Perl, all variable names are case-sensitive. At the top, you create a lexical variable $INFILE1. I'll talk more about lexical in a minute.

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";

Here you've got it in all-caps, which is ok. You may do that if it helps you to remember that it's a lexical file handle (file handles used to be global and named like INFILE1). But later on (in line 17) you use $infile1.

my @elements = split(";",$infile1);

You have not declared that variable (with my), so it throws this error. But that is not all.

I believe you are trying to read from that file handle. But that does not work. I'll explain this in steps. - In fact, you've built an infinite loop, but you are not yet aware of it.

    while ( my @infile1 ){ 

This while loop will not stop. Ever. The declaration of @infile1 with my always returns a true value, because it always works. So you'll never break the loop.

  • I guess you're trying to read the file line by line. So let's see how we can do that:

    while (my $infile1 = <$INFILE1> ){ 
      my @elements = split(";",$infile1);
    

    You need to read from the file like this. Now the assignment in the while loop's head will be true only as long as there is a line returned from the file handle. Once it is at the end of the file, it will return undef, thus ending the loop. Yay. Also note how your $infile1 in the next line with split now is correct.

    You also need to add chomp to the mix, because there are new line characters at the end of the file:

    while (my $infile1 = <$INFILE1> ){ 
      chomp $infile1;
      my @elements = split(";",$infile1);
    
  • Next is the seek line. This looks like you want to read the second file from the beginning for each line of the first file. That makes sense in a way, but is very inefficient. I'll talk about that later. You do need to change the my though. You don't have to create a new variable here. Also, use the correct name:

    seek $INFILE2, 0, 0;
    
  • Let's instead fix the second while loop:

    while (my $infile2 = <$INFILE2>){
      chomp $infile2;
      my @loopelements = split(";",$infile2);
    
  • The next thing I noticed was in line 42:

    my $distance = distance($lat1, $lon1 => $lat2, $lon2);
    

    Don't worry, there is nothing wrong here. I'd just like to note that the => is another way to write a comma (,). It's called the fat comma sometimes and it makes it easier to read, for example, hash assignments.

  • In line 50 you've already got the distance.

    if (($afstand < $maxdist) and (!($elements[4] == $loopelements[4]))){     
    

    and is usually used to do error checking. See the perldoc as to why. You should use && instead. Because it has higher precedence, you can leave out the parenthesis. You can also change your !($a == $b) construct to use the != operator instead. But since it holds the city name, and that is a string and not a number, you need to use ne, which is the opposite of eq. So this line now becomes:

    if ($afstand < $maxdist && $elements[4] ne $loopelements[4]){
    

    It's a lot better to read, isn't it?

  • In line 58 you join your array @elements and assign it to itself. That is rather strange. It will replace the array with a new array that has only one element - the joined string. Let's leave that line until the next bullet and look at it then.

  • In line 59 you've got a print statement, but you are now using a global file handle OUTFILE that you have never created. Instead, you need to use your lexical file handle from the top, $OUTFILE. If we now add the join from the line above directly to the print statement and also add a \n new line character at the end, the line becomes:

    print $OUTFILE join(";",@elements), "\n";
    
  • Now only the last part remains: You need to close the file handles, but again you're using global ones. Use your lexical ones instead:

    close($INFILE1);
    close($INFILE2);
    close($OUTFILE);
    

The complete code now looks like this:

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);

my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!\n";

my $maxdist = 5000;
my $mindist = 0.0001;

while (my $infile1 = <$INFILE1> ){
  chomp $infile1;
  my @elements = split(";",$infile1);

  my $lat1 = $elements[6];
  my $lon1 = $elements[7];

  $lat1 =~ s/,/./g;
  $lon1 =~ s/,/./g;

  print "1. $lat1\n";
  print "2. $lon1\n";

  seek $INFILE2, 0, 0;

  while ( my $infile2 = <$INFILE2> ){
    chomp $infile2;
    my @loopelements = split(";",$infile2);

    my $lat2 = $loopelements[6];
    my $lon2 = $loopelements[7];

    $lat2 =~ s/,/./g;
    $lon2 =~ s/,/./g;

    print "3. $lat1\n";
    print "4. $lon1\n";

    my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

    print "5. $distance\n";

    my $afstand = sprintf("%.4f",$distance);

    print "6. $afstand\n";

    if ($afstand < $maxdist && $elements[4] ne $loopelements[4]){ 
      push (@elements, $afstand,$loopelements[4],$loopelements[5]);
      print "7. $afstand\n";
    } else {
      next;
    }
  }

  print $OUTFILE join(";",@elements), "\n";
}

close($INFILE1);
close($INFILE2);
close($OUTFILE);

Now to the way your algorithm works: It would be a lot more efficient to first read the complete second file and then do the comparison with the first file in each iteration. That way, you only have to read the file once.

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);
use feature qw(say);

my $inputfile1 = shift || die "first file missing";
my $inputfile2 = shift || die "second file missing";
my $outputfile = shift || die "output file missing!";

# Read the second file first
my @file2; # save the lines of INFILE2 as array refs
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!";
while ( my $infile2 = <$INFILE2> ){ 
  chomp $infile2;
  my @loopelements = split(/;/, $infile2);

  $loopelements[6] =~ y/,/./;
  $loopelements[7] =~ y/,/./;

  push @file2, \@loopelements;
}
close($INFILE2);

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!";

my $maxdist = 5000;
my $mindist = 0.0001;

while (my $infile1 = <$INFILE1> ){
  chomp $infile1;
  my @elements = split(";",$infile1);

  my $lat1 = $elements[6];
  my $lon1 = $elements[7];

  $lat1 =~ y/,/./;
  $lon1 =~ y/,/./;

  say "1. $lat1";
  say "2. $lon1";

  FILE2: foreach my $loopelements ( @file2 ){
    my ($lat2, $lon2) = @$loopelements[6, 7];

    say "3. $lat2";
    say "4. $lon2";

    my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

    say "5. $distance";

    my $afstand = sprintf("%.4f",$distance);

    say "6. $afstand";

    if ($afstand < $maxdist && $elements[4] ne $$loopelements[4]){ 
      push (@elements, $afstand, $$loopelements[4], $$loopelements[5]);
      say "7. $afstand";
    } else {
      next FILE2;
    }
  }

  say $OUTFILE join(";",@elements);
}

close($INFILE1);
close($OUTFILE);

Now, let's look at what I changed.

  • First of all, I added use feature qw(say) to the top. say is the same as print, but a new line is added. That saves some typing. Also see feature for more info.
  • I removed the "\n" characters from all die statements. If you put a new line there, it removes the line number from the output. If that was intended, ignore this suggestion please. Here's what Perldoc has to say to this:

    If the last element of LIST does not end in a newline, the current script line number and input line number (if any) are also printed, and a newline is supplied.

  • The most important part is the change to the algorithm I made. I moved the while loop for the second file outside of the other while loop, to the top of the program. The file is slurped into the array @file2. Each element holds an array ref with the fields of the line. The commas are already changed to full stop signs.

    I changed the s/// substitution operator to a y/// (short of tr///) transliteration operator. Since you only change one sign, this is sufficient. It is also faster. Even if you'd leave the regex substitution in, you would not need the /g modifier, because a float only ever has one comma, so it does not have to do multiple substitutions.

    Now all these things are done only once for file2. This saves considerable computing time when done 40k+ times.

  • I changed the wording of the error messages so they can be understood better. That is preference. You don't have to do that.

  • I changed the second while to a foreach loop to iterate over the elements of the new @file2 array. I did leave the $lat2 and $lon2 vars in for clarity. You could omit these, and directly work with the array(ref) elements. In the assignment I used an array slice to put it into one line.

  • Since $loopelements replaces @loopelements and it is an array ref, we need to access the data stored within using $$loopelements[$index] now.

I hope this helps you understand why I have made certain improvements.

Please remember that in Perl There Is More Than One Way To Do It - that is a good thing. There seldom is the right way, but often many ways that lead to the goal. Some are more efficient than others, while these others sometimes are easier to maintain. The trick is to find the right balance between these two cases.


Update:

Here are the input files I have used. You will need them to compare results.

file1.csv:

Europe;3;France;23;Parijs;42545;48,856555;2,350976
Europe;3;France;23;Parisot;84459;44,264381;1,857827
Europe;3;France;23;Parlan;11337;44,828976;2,172435
Europe;3;France;23;Parnac;35670;46,4533;1,4425
Europe;3;France;23;Parnans;22065;45,1097;5,1456

file2.csv:

Europe;3;France;23;Parlan;11337;44,828976;2,172435
Europe;3;France;23;Parnac;35670;46,4533;1,4425
Europe;3;France;23;Parnans;22065;45,1097;5,1456
Europe;3;France;23;Parijs;42545;48,856555;2,350976
Europe;3;France;23;Parisot;84459;44,264381;1,857827
like image 70
simbabque Avatar answered Sep 28 '22 05:09

simbabque