Before using R, I used quite a bit of Perl. In Perl, I would often use hashes, and lookups of hashes are generally regarded as fast in Perl.
For example, the following code will populate a hash with up to 10000 key/value pairs, where the keys are random letters and the values are random integers. Then, it does 10000 random lookups in that hash.
#!/usr/bin/perl -w use strict; my @letters = ('a'..'z'); print @letters . "\n"; my %testHash; for(my $i = 0; $i < 10000; $i++) { my $r1 = int(rand(26)); my $r2 = int(rand(26)); my $r3 = int(rand(26)); my $key = $letters[$r1] . $letters[$r2] . $letters[$r3]; my $value = int(rand(1000)); $testHash{$key} = $value; } my @keyArray = keys(%testHash); my $keyLen = scalar @keyArray; for(my $j = 0; $j < 10000; $j++) { my $key = $keyArray[int(rand($keyLen))]; my $lookupValue = $testHash{$key}; print "key " . $key . " Lookup $lookupValue \n"; }
Now that increasingly, I am wanting to have a hash-like data structure in R. The following is the equivalent R code:
testHash <- list() for(i in 1:10000) { key.tmp = paste(letters[floor(26*runif(3))], sep="") key <- capture.output(cat(key.tmp, sep="")) value <- floor(1000*runif(1)) testHash[[key]] <- value } keyArray <- attributes(testHash)$names keyLen = length(keyArray); for(j in 1:10000) { key <- keyArray[floor(keyLen*runif(1))] lookupValue = testHash[[key]] print(paste("key", key, "Lookup", lookupValue)) }
The code seem to be doing equivalent things. However, the Perl one is much faster:
>time ./perlHashTest.pl real 0m4.346s user **0m0.110s** sys 0m0.100s
Comparing to R:
time R CMD BATCH RHashTest.R real 0m8.210s user **0m7.630s** sys 0m0.200s
What explains the discrepancy? Are lookups in R lists just not good?
Increasing to 100,000 list length and 100,000 lookups only exaggerates the discrepancy? Is there a better alternative for the hash data structure in R than the native list()?
There are some operations which are not efficiently supported by hash tables, such as iterating over all the elements whose keys are within a certain range, finding the element with the largest key or smallest key, and so on.
HASH VALUES can be any R value, vector or object. PASS-BY REFERENCE. Environments and hashes are special objects in R because only one copy exists globally. When provide as an argument to a function, no local copy is made and any changes to the hash in the functions are reflected globally.
R Language Hashmaps Environments as hash maps Note: in the subsequent passages, the terms hash map and hash table are used interchangeably and refer to the same concept, namely, a data structure providing efficient key lookup through use of an internal hash function.
The hash() function returns a 128-bit, 160-bit, or 256-bit hash of the input data, depending on the algorithm that you select. This function provides 2128- 2256 distinct return values and is intended for cryptographic purposes. The function almost never produces the same output for two different inputs.
The underlying reason is that R lists with named elements are not hashed. Hash lookups are O(1), because during insert the key is converted to an integer using a hash function, and then the value put in the space hash(key) % num_spots
of an array num_spots
long (this is a big simplification and avoids the complexity of dealing with collisions). Lookups of the key just require hashing the key to find the value's position (which is O(1), versus a O(n) array lookup). R lists use name lookups which are O(n).
As Dirk says, use the hash package. A huge limitation with this is that it uses environments (which are hashed) and overriding of [
methods to mimic hash tables. But an environment cannot contain another environment, so you cannot have nested hashes with the hash function.
A while back I worked on implementing a pure hash table data structure in C/R that could be nested, but it went on my project back burner while I worked on other things. It would be nice to have though :-)
You could try environments and/or the hash package by Christopher Brown (which happens to use environments under the hood).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With