What's the most efficient way to check for duplicates in an array of data using Perl?

2 Answers

One of the things I love about Perl is it's ability to almost read like English. It just sort of makes sense.

use strict; use warnings;  my @array = qw/yes no maybe true false false perhaps no/;  my %seen;  foreach my $string (@array) {      next unless $seen{$string}++;     print "'$string' is duplicated.\n"; }

Output

'false' is duplicated.

'no' is duplicated.

answered Oct 07 '22 03:10

Zaid

Turning the array into a hash is the fastest way [O(n)], though its memory inefficient. Using a for loop is a bit faster than grep, but I'm not sure why.

#!/usr/bin/perl  use strict; use warnings;  my %count; my %dups; for(@array) {     $dups{$_}++ if $count{$_}++; }

A memory efficient way is to sort the array in place and iterate through it looking for equal and adjacent entries.

# not exactly sort in place, but Perl does a decent job optimizing it @array = sort @array;  my $last; my %dups; for my $entry (@array) {     $dups{$entry}++ if defined $last and $entry eq $last;     $last = $entry; }

This is nlogn speed, because of the sort, but only needs to store the duplicates rather than a second copy of the data in %count. Worst case memory usage is still O(n) (when everything is duplicated) but if your array is large and there's not a lot of duplicates you'll win.

Theory aside, benchmarking shows the latter starts to lose on large arrays (like over a million) with a high percentage of duplicates.

answered Oct 07 '22 04:10

Schwern

Related questions
                            
                                What's the reverse of Math Power (**) in Ruby?
                            
                                PHP $_FILES['file']['tmp_name']: How to preserve filename and extension?
                            
                                how do i get TcpListener to accept multiple connections and work with each one individually?
                            
                                spring i18n: problem with multiple property files
                            
                                Using QLineEdit for passwords
                            
                                Does not equal conditional
                            
                                pthread_detach question
                            
                                How to limit the number of characters allowed in a textbox?
                            
                                Scroll if element is not visible
                            
                                Android: How to scroll ScrollView in top
                            
                                Reload specific UITableView cell in iOS
                            
                                Android: How do I display an updating clock in a TextView

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the most efficient way to check for duplicates in an array of data using Perl?

Tags:

teukkam

People also ask

2 Answers

Output

Zaid

Schwern

Recent Activity

Donate For Us