Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure for unique element storage

Tags:

list

unique

perl

xs

I'm looking for a data structure which should preferably perform equal O(1)? for any number of elements when adding/removing/retrieving elements.

Here are some additional guidelines,

  • retrieving elements should not involve slow keys()
  • elements must be always unique and defined
  • element order is not significant
  • addition or removal of element should not involve iteration over other elements
  • gaps in retrieved list of elements are tolerable and can be represented with undef value

Please suggest better solution than,

sub uniqArrayFactory {
  my $members = [];
  my $seen = {};
  my $gaps = [];

  return sub {
    my (%arg) = @_;

    return $members if $arg{members};
    my $m;
    if (defined ($m = $arg{del})) {

      return if !$seen->{$m};
      ${ $seen->{$m} } = undef;
      push @$gaps, delete($seen->{$m});
    }
    elsif (defined ($m = $arg{add})) {

      return if $seen->{$m};
      if (@$gaps) {
        $seen->{$m} = pop @$gaps;
        ${ $seen->{$m} } = $m;
      }
      else {
        push @$members, $m;
        $seen->{$m} = \( $members->[-1] );
      }
    }
    return $m;
  };
}

UPDATE (usage)

my $fa = uniqArrayFactory();

$fa->(add => 10);
$fa->(del => 10);
my $members = $fa->(mebers => 1);
like image 817
mpapec Avatar asked May 19 '16 12:05

mpapec


2 Answers

keys and each are surprisingly slow indeed. But if you store each element as a value of a hash and use values, things get a low faster. With

use strict;
use warnings;
use Benchmark qw(:all);

my $i;
my $fa;
my %hash;

my %compare = (
  uarray => sub {
    $fa->(add => $i++);
    my $memb = $fa->(members => 1);
    for my $v (@$memb) { next if !defined $v; }
  },
  hash => sub {
    $hash{ $i } = $i;
    for my $v (values %hash) {}
    $i++;
  },
);

$i = 0; $fa = uniqArrayFactory(); %hash = ();
cmpthese(10000, \%compare);

sub uniqArrayFactory {
  my $members = [];
  my $seen = {};
  my $gaps = [];

  return sub {
    my (%arg) = @_;

    return $members if exists $arg{members};
    my $m;
    if (defined ($m = $arg{del})) {

      return if !$seen->{$m};
      ${ $seen->{$m} } = undef;
      push @$gaps, delete($seen->{$m});
    }
    elsif (defined ($m = $arg{add})) {

      return if $seen->{$m};
      if (@$gaps) {
        $seen->{$m} = pop @$gaps;
        ${ $seen->{$m} } = $m;
      }
      else {
        push @$members, $m;
        $seen->{$m} = \( $members->[-1] );
      }
    }
    return $m;
  };
}

I get:

         Rate   hash uarray
hash   3205/s     --    -6%
uarray 3401/s     6%     --
like image 92
nwellnhof Avatar answered Oct 02 '22 20:10

nwellnhof


Ironically, maybe Tie::IxHash, which was motivated by the desire to retrieve the keys of a hash in a specified order, is as close as you're going to get to what you want.

In the Tie::IxHash implementation, keys and values are stored in array references. keys returns a copy of the set of keys, but something like (tied %hash)->[1] would give you direct access to it.

Removing elements in a Tie::IxHash is O(n). A possible workaround for that would be to replace values with undef rather than deleting them. That is, preferring

$ixhash{$obsolete_key} = undef;

to

delete $ixhash{$obsolete_key};

Or if you were able to pool your deletions -- if you could organize your code so that you usually called delete on several keys around the same time and in between other operations on the hash -- then there are opportunities for improving on Tie::IxHash.

like image 30
mob Avatar answered Oct 02 '22 22:10

mob