Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case Insensitive Unique Array Elements in Perl

Tags:

perl

uniq

I am using the uniq function exported by the module, List::MoreUtils to find the uniq elements in an array. However, I want it to find the uniq elements in a case insensitive way. How can I do that?

I have dumped the output of the Array using Data::Dumper:

#! /usr/bin/perl

use strict;
use warnings;
use Data::Dumper qw(Dumper);
use List::MoreUtils qw(uniq);
use feature "say";

my @elements=<array is formed here>;

my @words=uniq @elements;

say Dumper \@words;

Output:

$VAR1 = [
          'John',
          'john',
          'JohN',
          'JOHN',
          'JoHn',
          'john john'
        ];

Expected output should be: john, john john

Only 2 elements, rest all should be filtered since they are the same word, only the difference is in case.

How can I remove the duplicate elements ignoring the case?

like image 430
Neon Flash Avatar asked Oct 25 '12 17:10

Neon Flash


1 Answers

Use lowercase, lc with a map statement:

my @uniq_no_case = uniq map lc, @elements;

The reason List::MoreUtils' uniq is case sensitive is that it relies on the deduping characteristics of hashes, which also is case sensitive. The code for uniq looks like so:

sub uniq {
    my %seen = ();
    grep { not $seen{$_}++ } @_;
}

If you want to use this sub directly in your own code, you could incorporate lc in there:

sub uniq_no_case {
    my %seen = ();
    grep { not $seen{$_}++ } map lc, @_;
}

Explanation of how this works:

@_ contains the args to the subroutine, and they are fed to a grep statement. Any elements that return true when passed through the code block are returned by the grep statement. The code block consist of a few finer points:

  • $seen{$_}++ returns 0 the first time an element is seen. The value is still incremented to 1, but after it is returned (as opposed to ++$seen{$_} who would inc first, then return).
  • By negating the result of the incrementation, we get true for the first key, and false for every following such key. Hence, the list is deduped.
  • grep as the last statement in the sub will return a list, which in turn is returned by the sub.
  • map lc, @_ simply applies the lc function to all elements in @_.
like image 136
TLP Avatar answered Sep 21 '22 12:09

TLP