This is one of the questions in the Cracking the Coding Interview book by Gayle Laakmann McDowell: <blockquote> Implement an algorithm to determine if a string has all unique characters. What if you can not use additional data structures? </blockquote> The author wrote: <blockquote> We can reduce our space usage a little bit by using a bit vector. We will assume, in the below code, that the string is only lower case <code>'a'</code> through <code>'z'</code>. This will allow us to use just a single int. </blockquote> The author has this implementation: <pre class="prettyprint"><code>public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - 'a'; if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; } </code></pre> Let's say we get rid of the assumption that "the string is only lower case <code>'a'</code> through <code>'z'</code>". Instead, the string can contain any kind of character—like ASCII characters or Unicode characters. Is there a solution as efficient as the author's (or a solution that comes close to being as efficient as the author's)? Related questions: <ul> <li>Detecting if a string has unique characters: comparing my solution to "Cracking the Coding Interview?"</li> <li>Explain the use of a bit vector for determining if all characters are unique</li> <li>String unique characters</li> <li>Implementing an algorithm to determine if a string has all unique characters</li> <li>determine if a string has all unique characters?</li> </ul>

for the asccii character set you can represent the 256bits in 4 longs: you basically hand code an array. <pre class="prettyprint"><code>public static boolean isUniqueChars(String str) { long checker1 = 0; long checker2 = 0; long checker3 = 0; long checker4 = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i); int toCheck = val / 64; val %= 64; switch (toCheck) { case 0: if ((checker1 & (1L << val)) > 0) { return false; } checker1 |= (1L << val); break; case 1: if ((checker2 & (1L << val)) > 0) { return false; } checker2 |= (1L << val); break; case 2: if ((checker3 & (1L << val)) > 0) { return false; } checker3 |= (1L << val); break; case 3: if ((checker4 & (1L << val)) > 0) { return false; } checker4 |= (1L << val); break; } } return true; } </code></pre> You can use the following code to generate the body of a similar method for unicode characters: <pre class="prettyprint"><code>static void generate() { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 1024; i++) { sb.append(String.format("long checker%d = 0;%n", i)); } sb.append("for (int i = 0; i < str.length(); ++i) {\n" + "int val = str.charAt(i);\n" + "int toCheck = val / 64;\n" + "val %= 64;\n" + "switch (toCheck) {\n"); for (int i = 0; i < 1024; i++) { sb.append(String.format("case %d:\n" + "if ((checker%d & (1L << val)) > 0) {\n" + "return false;\n" + "}\n" + "checker%d |= (1L << val);\n" + "break;\n", i, i, i)); } sb.append("}\n" + "}\n" + "return true;"); System.out.println(sb); } </code></pre>

I think we need a general and practical definition of "additional data structures". Intuitively, we don't want to call every scalar integer or pointer a "data structure", because that makes nonsense of any prohibition of "additional data structures". I propose we borrow a concept from big-O notation: an "additional data structure" is one that grows with the size of the data set. In the present case, the code quoted by the OP appears to have a space requirement of O(1) because the bit vector happens to fit into an integer type. But as the OP implies, the general form of the problem is really O(N). An example of a solution to the general case is to use two pointers and a nested loop to simply compare every character to every other. The space requirement is O(1) but the time requirement is O(N^2).

Determining a string has all unique characters without using additional data structures and without the lowercase characters assumption

Tags:

java

string

algorithm

bit-manipulation

bitvector

This is one of the questions in the Cracking the Coding Interview book by Gayle Laakmann McDowell:

Implement an algorithm to determine if a string has all unique characters. What if you can not use additional data structures?

The author wrote:

We can reduce our space usage a little bit by using a bit vector. We will assume, in the below code, that the string is only lower case 'a' through 'z'. This will allow us to use just a single int.

The author has this implementation:

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0)
            return false;
        checker |= (1 << val);
    }
    return true;
}

Let's say we get rid of the assumption that "the string is only lower case 'a' through 'z'". Instead, the string can contain any kind of character—like ASCII characters or Unicode characters.

Is there a solution as efficient as the author's (or a solution that comes close to being as efficient as the author's)?

user3184017

3 Answers

for the asccii character set you can represent the 256bits in 4 longs: you basically hand code an array.

public static boolean isUniqueChars(String str) {
    long checker1 = 0;
    long checker2 = 0;
    long checker3 = 0;
    long checker4 = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i);
        int toCheck = val / 64;
        val %= 64;
        switch (toCheck) {
            case 0:
                if ((checker1 & (1L << val)) > 0) {
                    return false;
                }
                checker1 |= (1L << val);
                break;
            case 1:
                if ((checker2 & (1L << val)) > 0) {
                    return false;
                }
                checker2 |= (1L << val);
                break;
            case 2:
                if ((checker3 & (1L << val)) > 0) {
                    return false;
                }
                checker3 |= (1L << val);
                break;
            case 3:
                if ((checker4 & (1L << val)) > 0) {
                    return false;
                }
                checker4 |= (1L << val);
                break;
        }            
    }
    return true;
}

You can use the following code to generate the body of a similar method for unicode characters:

static void generate() {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("long checker%d = 0;%n", i));
    }
    sb.append("for (int i = 0; i < str.length(); ++i) {\n"
            + "int val = str.charAt(i);\n"
            + "int toCheck = val / 64;\n"
            + "val %= 64;\n"
            + "switch (toCheck) {\n");
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("case %d:\n"
                + "if ((checker%d & (1L << val)) > 0) {\n"
                + "return false;\n"
                + "}\n"
                + "checker%d |= (1L << val);\n"
                + "break;\n", i, i, i));
    }
    sb.append("}\n"
            + "}\n"
            + "return true;");
    System.out.println(sb);
}

178

answered Oct 09 '22 18:10

vandale

You only need one line... well less than one line actually:

if (str.matches("((.)(?!.*\\1))*"))

this uses a negative look ahead to assert that each character is not repeated later in the string.

This approach a time complexity of O(n^2), because for all n characters in the input, all characters that follow (there are n of those) are compared for equality.

answered Oct 09 '22 18:10

Bohemian

I think we need a general and practical definition of "additional data structures". Intuitively, we don't want to call every scalar integer or pointer a "data structure", because that makes nonsense of any prohibition of "additional data structures".

I propose we borrow a concept from big-O notation: an "additional data structure" is one that grows with the size of the data set.

In the present case, the code quoted by the OP appears to have a space requirement of O(1) because the bit vector happens to fit into an integer type. But as the OP implies, the general form of the problem is really O(N).

An example of a solution to the general case is to use two pointers and a nested loop to simply compare every character to every other. The space requirement is O(1) but the time requirement is O(N^2).

answered Oct 09 '22 18:10

A. I. Breveleri

Related questions
                            
                                What pattern is used in Collections.synchronizedList()
                            
                                practice: removing all string occurrences from another
                            
                                Logoff computer using java
                            
                                How to pass parameters to redirect page in spring-mvc
                            
                                How to convert String object to IntWritable Object in Hadoop
                            
                                Java 7 watchservice get file change offset
                            
                                Duplicate Spring Batch Job Instance
                            
                                Get object by ID in Hibernate
                            
                                UndeclaredThrowableException thrown by IndexOutOfBoundsException
                            
                                Strategy Design Pattern, Generics and TypeSafety
                            
                                Android - How do I "start" or "initialize" a Fragment from an activity?
                            
                                Returning ModelAndView in ajax spring mvc
                            
                                Android: Passing Objects Between Fragments
                            
                                Spring Security: Java Config does not work
                            
                                Importing "google/protobuf/descriptor.proto" in java protocol buffers
                            
                                Hibernate Inheritance - Getting superclass instance and casting into subclass
                            
                                Mapping a SQL View with no Primary Key to JPA Entity
                            
                                MongoDB full text search index: error: too many text index for, why?
                            
                                JavaFx removing from pane object when I am in this object's class
                            
                                json array in hashmap using google gson

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Determining a string has all unique characters without using additional data structures and without the lowercase characters assumption

Tags:

java

string

algorithm

bit-manipulation

bitvector

user3184017

People also ask

3 Answers

vandale

Bohemian

A. I. Breveleri

Recent Activity

Donate For Us