Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding code: Hash, grep for duplicates (modified to check for multiple elements)

Tags:

perl

Code:

@all_matches = grep
{
    ! ( $seensentence
    {
        $_->[0] .'-'. $_->[1] .'-'. $_->[5]
    }
    ++ )
}
@all_matches;

Purpose: This code removes duplicates of certain elements from the array @all_matches which is an AoA.

My attempt at full breakdown ( with ??..?? around where I'm unsure ):

Grep returns the elements of @all_matches which return true.

The key of the hash %seensentence is ??the three elements?? of @all_matches. Since a hash can only have unique keys, the first time through it's value is incremented from undef(0) to 1. The next time through, it is a defined value, but the ! means grep returns it only if it's undef (unique value associated with that element).


My Questions:

(1) How can I turn {$_->[0] .'-'. $_->[1] .'-'. $_->[5]}++ into a HoH?

I was told this is another (idiomatic) way to accomplish it. A stab in the dark would be:

( {$_->[0] => 0,
$_->[1] => 0,
$_->[5] => 0} )++

(1b) Because I don't understand how the original is doing what I want it to. I read that -bareword is equiv to "-bareword" so I tried: {"$_->[0]" . "$_->[1]". "$_->[5]"} and it seemed to work the exact same. Still I don't understand: is it treating each element as a key (a) separately (like an array of keys) or is it (b)Correct: all simultaneously (since . concatenates them all into one string) or is it (c) not doing what I think it is?

(2) What does this mean: $_->[0] || $_->[1] || $_->[5] ? It doesn't do the same as above.

I read that: short circuit logical operators return the last value, so it would check a value at {$_->[0]} and if there was one, I thought the value there would be incremented, if not it would check the next element until none were true, which is when grep pass the unique value on.


Thanks for your time, I tried to be as thorough as possible (to a fault?) but let me know if there is anything missing.

like image 802
Jon Avatar asked Jul 09 '11 18:07

Jon


Video Answer


1 Answers

First lets turn the grep into a foreach loop so that we can examine it more clearly. I'm going to expand some of the idioms into larger constructs for clarity's sake.

my @all_matches = ( ... );
{
    my %seen;
    my @no_dupes;
    foreach my $match ( @all_matches ) {
        my $first_item  = $match->[0];
        my $second_item = $match->[1];
        my $third_item  = $match->[5];
        my $key = join '-', $first_item, $second_item, $third_item;
        if( not $seen{ $key }++ ) {
            push @no_dupes, $match;
        }
    }
    @all_matches = @no_dupes;
}

In other words, the original coder is creating a hash key using the array reference held in $match, for each of the referent indices of $match->[0], 1, and 5. As hash keys are unique, any duplicates will be dropped by checking if the key already exists before pushing into @no_dupes.

The grep{} mechanism is just a more code-efficient (ie, quicker to type, and no throwaway variables) idiom to accomplish the same thing. If it works, why refactor it? What is it not doing that you need to improve upon?

To do the same with a HoH, you could do this:

my @all_matches = ( ... );
{
    my %seen;
    my @no_dupes;
    foreach my $match ( @all_matches ) {
        my $first_item  = $match->[0];
        my $second_item = $match->[1];
        my $third_item  = $match->[5];
        if( not $seen{ $first_item }->{ $second_item }->{ $third_item }++ ) {
            push @no_dupes, $match;
        }
    }
    @all_matches = @no_dupes;
}

Which could be translated back into a grep as follows:

my @all_matches = ( ... );
{
    my %seen;
    @all_matches = grep { not $seen{$_->[0]}->{$_->[1]}{$_->[5]}++ } @all_matches;
}

However, this is a case where I don't see a clear advantage to building a datastructure, unless you intend to use %seen later for something else.

With respect to the || operator, that's a different animal. I can't think of any useful way to employ it in this context. The logical short circuit operator of, say, "$a || $b || $c" tests the boolean truthfulness of $a. If it's true, it returns its value. If it's false, it checks $b the same way. If it's false, it checks $c the same way. But if $a is true, $b never gets checked. If $b is true, $c never gets checked.

like image 184
DavidO Avatar answered Nov 15 '22 04:11

DavidO