Data structure for unique element storage

Question

I'm looking for a data structure which should preferably perform equal O(1)? for any number of elements when adding/removing/retrieving elements.

Here are some additional guidelines,

retrieving elements should not involve slow keys()
elements must be always unique and defined
element order is not significant
addition or removal of element should not involve iteration over other elements
gaps in retrieved list of elements are tolerable and can be represented with undef value

Please suggest better solution than,

sub uniqArrayFactory {
  my $members = [];
  my $seen = {};
  my $gaps = [];

  return sub {
    my (%arg) = @_;

    return $members if $arg{members};
    my $m;
    if (defined ($m = $arg{del})) {

      return if !$seen->{$m};
      ${ $seen->{$m} } = undef;
      push @$gaps, delete($seen->{$m});
    }
    elsif (defined ($m = $arg{add})) {

      return if $seen->{$m};
      if (@$gaps) {
        $seen->{$m} = pop @$gaps;
        ${ $seen->{$m} } = $m;
      }
      else {
        push @$members, $m;
        $seen->{$m} = \( $members->[-1] );
      }
    }
    return $m;
  };
}

UPDATE (usage)

my $fa = uniqArrayFactory();

$fa->(add => 10);
$fa->(del => 10);
my $members = $fa->(mebers => 1);

nwellnhof · Accepted Answer

keys and each are surprisingly slow indeed. But if you store each element as a value of a hash and use values, things get a low faster. With

use strict;
use warnings;
use Benchmark qw(:all);

my $i;
my $fa;
my %hash;

my %compare = (
  uarray => sub {
    $fa->(add => $i++);
    my $memb = $fa->(members => 1);
    for my $v (@$memb) { next if !defined $v; }
  },
  hash => sub {
    $hash{ $i } = $i;
    for my $v (values %hash) {}
    $i++;
  },
);

$i = 0; $fa = uniqArrayFactory(); %hash = ();
cmpthese(10000, \%compare);

sub uniqArrayFactory {
  my $members = [];
  my $seen = {};
  my $gaps = [];

  return sub {
    my (%arg) = @_;

    return $members if exists $arg{members};
    my $m;
    if (defined ($m = $arg{del})) {

      return if !$seen->{$m};
      ${ $seen->{$m} } = undef;
      push @$gaps, delete($seen->{$m});
    }
    elsif (defined ($m = $arg{add})) {

      return if $seen->{$m};
      if (@$gaps) {
        $seen->{$m} = pop @$gaps;
        ${ $seen->{$m} } = $m;
      }
      else {
        push @$members, $m;
        $seen->{$m} = \( $members->[-1] );
      }
    }
    return $m;
  };
}

I get:

         Rate   hash uarray
hash   3205/s     --    -6%
uarray 3401/s     6%     --

mob · Answer

Ironically, maybe Tie::IxHash, which was motivated by the desire to retrieve the keys of a hash in a specified order, is as close as you're going to get to what you want.

In the Tie::IxHash implementation, keys and values are stored in array references. keys returns a copy of the set of keys, but something like (tied %hash)->[1] would give you direct access to it.

Removing elements in a Tie::IxHash is O(n). A possible workaround for that would be to replace values with undef rather than deleting them. That is, preferring

$ixhash{$obsolete_key} = undef;

to

delete $ixhash{$obsolete_key};

Or if you were able to pool your deletions -- if you could organize your code so that you usually called delete on several keys around the same time and in between other operations on the hash -- then there are opportunities for improving on Tie::IxHash.

Data structure for unique element storage

Tags:

list

unique

perl

xs

mpapec

2 Answers

nwellnhof

mob

Recent Activity

Donate For Us

Data structure for unique element storage

Tags:

list

unique

perl

xs

mpapec

2 Answers

nwellnhof

mob

Related questions

Recent Activity

Donate For Us