How to find identical byte[]-objects in two arrays concurrently?

Tags:

I'm trying to implement an collision attack on hashes (I'm visiting the course 'cryptography'). Therefore I have two arrays of hashes (= byte-sequences byte[]) and want to find hashes which are present in both arrays. After some research and a lot of thinking I am sure that the best solution on a single-core machine would be a HashSet (add all elements of the first array and check via contains if elements of the second array are already present).

However, I want to implement a concurrent solution, since I have access to a machine with 8 cores and 12 GB RAM. The best solution I can think of is ConcurrentHashSet, which could be created via Collections.newSetFromMap(new ConcurrentHashMap<A,B>()). Using this data structure I could add all elements of the first array in parallel and - after all elements where added - I can concurrently check via contains for identical hashes.

So my question is: Do you know an algorithm designed for this exact problem? If not, do you have experience using such a ConcurrentHashSet concerning problems and effective runtime complexity? Or can you recommend another prebuilt data structure which could help me?

PS: If anyone is interested in the details: I plan to use Skandium to parallelize my program.

751

asked Jan 02 '12 12:01

Florian Pilz

1 Answers

I think it would be a complete waste of time to use any form of HashMap. I am guessing you are calculating multi-byte hashes of various data, these are already hashes, there is no need to perform any more hashing on them.

Although you do not state it, I am guessing your hashes are byte sequences. Clearly either a trie or a dawg would be ideal to store these.

I would suggest therefore you implement a trie/dawg and use it to store all of the hashes in the first array. You could then use all of your computing power in parallel to lookup each element in your second array in this trie. No locks would be required.

Added

Here's a simple Dawg implementation I knocked together. It seems to work.

public class Dawg {
  // All my children.
  Dawg[] children = new Dawg[256];
  // Am I a leaf.
  boolean isLeaf = false;

  // Add a new word.
  public void add ( byte[] word ) {
    // Finds its location, growing as necessary.
    Dawg loc = find ( word, 0, true );
    loc.isLeaf = true;
  }

  // String form.
  public void add ( String word ) {
    add(word.getBytes());
  }

  // Returns true if word is in the dawg.
  public boolean contains ( byte [] word ) {
    // Finds its location, no growing allowed.
    Dawg d = find ( word, 0, false );
    return d != null && d.isLeaf; 
  }

  // String form.
  public boolean contains ( String word ) {
    return contains(word.getBytes());
  }

  // Find the Dawg - growing the tree as necessary if requested.
  private Dawg find ( byte [] word, int i, boolean grow ) {
    Dawg child = children[word[i]];
    if ( child == null ) {
      // Not present!
      if ( grow ) {
        // Grow the tree.
        child = new Dawg();
        children[word[i]] = child;
      }
    }
    // Found it?
    if ( child != null ) {
      // More to find?
      if ( i < word.length - 1 ) {
        child = child.find(word, i+1, grow);
      }
    }
    return child;
  }

  public static void main ( String[] args ) {
    Dawg d = new Dawg();
    d.add("H");
    d.add("Hello");
    d.add("World");
    d.add("Hell");
    System.out.println("Hello is "+(d.contains("Hello")?"in":"out"));
    System.out.println("World is "+(d.contains("World")?"in":"out"));
    System.out.println("Hell is "+(d.contains("Hell")?"in":"out"));
    System.out.println("Hal is "+(d.contains("Hal")?"in":"out"));
    System.out.println("Hel is "+(d.contains("Hel")?"in":"out"));
    System.out.println("H is "+(d.contains("H")?"in":"out"));
  }
}

Added

This could be a good start at a concurrent lock-free version. These things are notoriously difficult to test so I cannot guarantee this will work but to my mind it certainly should.

import java.util.concurrent.atomic.AtomicReferenceArray;


public class LFDawg {
  // All my children.
  AtomicReferenceArray<LFDawg> children = new AtomicReferenceArray<LFDawg> ( 256 );
  // Am I a leaf.
  boolean isLeaf = false;

  // Add a new word.
  public void add ( byte[] word ) {
    // Finds its location, growing as necessary.
    LFDawg loc = find( word, 0, true );
    loc.isLeaf = true;
  }

  // String form.
  public void add ( String word ) {
    add( word.getBytes() );
  }

  // Returns true if word is in the dawg.
  public boolean contains ( byte[] word ) {
    // Finds its location, no growing allowed.
    LFDawg d = find( word, 0, false );
    return d != null && d.isLeaf;
  }

  // String form.
  public boolean contains ( String word ) {
    return contains( word.getBytes() );
  }

  // Find the Dawg - growing the tree as necessary if requested.
  private LFDawg find ( byte[] word, int i, boolean grow ) {
    LFDawg child = children.get( word[i] );
    if ( child == null ) {
      // Not present!
      if ( grow ) {
        // Grow the tree.
        child = new LFDawg();
        if ( !children.compareAndSet( word[i], null, child ) ) {
          // Someone else got there before me. Get the one they set.
          child = children.get( word[i] );
        }
      }
    }
    // Found it?
    if ( child != null ) {
      // More to find?
      if ( i < word.length - 1 ) {
        child = child.find( word, i + 1, grow );
      }
    }
    return child;
  }

  public static void main ( String[] args ) {
    LFDawg d = new LFDawg();
    d.add( "H" );
    d.add( "Hello" );
    d.add( "World" );
    d.add( "Hell" );
    System.out.println( "Hello is " + ( d.contains( "Hello" ) ? "in" : "out" ) );
    System.out.println( "World is " + ( d.contains( "World" ) ? "in" : "out" ) );
    System.out.println( "Hell is " + ( d.contains( "Hell" ) ? "in" : "out" ) );
    System.out.println( "Hal is " + ( d.contains( "Hal" ) ? "in" : "out" ) );
    System.out.println( "Hel is " + ( d.contains( "Hel" ) ? "in" : "out" ) );
    System.out.println( "H is " + ( d.contains( "H" ) ? "in" : "out" ) );
  }
}

119

answered Oct 21 '22 11:10

OldCurmudgeon

Related questions
                            
                                Eclipse - how to set javadoc warnings only to specific folders, not the whole project
                            
                                Execute javascript function from java
                            
                                Single transaction using multiple connections. (MYSQL/JDBC)
                            
                                Get unique file id in Windows with Java?
                            
                                Class serialization help (org.omg.CORBA.MARSHAL: WARNUNG: IOP00810011: Exception... error)
                            
                                Getting Javassist types instead of actual Hibernate entity types
                            
                                separate logger for each instance of a library
                            
                                Display info messages in jsp for Spring WebFlow 2.0
                            
                                JPopupMenu closes when child popup opens
                            
                                Wait for threads to write changes in the shutdownhook
                            
                                EntityManager.contains() returns false after persist()
                            
                                Are there any Java swing GUI app builder tool that are not IDE specific? [closed]
                            
                                Decoupling Spring MVC's Controller from the HTTPServlet
                            
                                JPA. JoinTable and two JoinColumns
                            
                                Automatically create testcode from class (jars
                            
                                AWS Java TransferManager issue, JVM could not obtain any more native threads
                            
                                How to get screenshot of any Linux/Windows application running outside of the JVM
                            
                                Converting OpenCV code from C++ to Java
                            
                                How to match similar colours in Java using getRGB
                            
                                Convert JSON date format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find identical byte[]-objects in two arrays concurrently?

Tags:

java

multithreading

concurrency

hash

hash-collision

Florian Pilz

People also ask

1 Answers

OldCurmudgeon

Recent Activity

Donate For Us