Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

function to remove duplicate characters in a string

Tags:

java

string

The following code is trying to remove any duplicate characters in a string. I'm not sure if the code is right. Can anybody help me work with the code (i.e whats actually happening when there is a match in characters)?

public static void removeDuplicates(char[] str) {   if (str == null) return;   int len = str.length;   if (len < 2) return;   int tail = 1;   for (int i = 1; i < len; ++i) {     int j;     for (j = 0; j < tail; ++j) {       if (str[i] == str[j]) break;     }     if (j == tail) {       str[tail] = str[i];       ++tail;     }   }   str[tail] = 0; } 
like image 444
Jony Avatar asked Apr 08 '10 07:04

Jony


People also ask

How do you remove duplicate characters or words in a cell string?

Select a cell inside the data which you want to remove duplicates from and go to the Data tab and click on the Remove Duplicates command. Excel will then select the entire set of data and open up the Remove Duplicates window.

Which function duplicates a string?

The C library function to copy a string is strcpy(), which (I'm guessing) stands for string copy.


2 Answers

The function looks fine to me. I've written inline comments. Hope it helps:

// function takes a char array as input. // modifies it to remove duplicates and adds a 0 to mark the end // of the unique chars in the array. public static void removeDuplicates(char[] str) {   if (str == null) return; // if the array does not exist..nothing to do return.   int len = str.length; // get the array length.   if (len < 2) return; // if its less than 2..can't have duplicates..return.   int tail = 1; // number of unique char in the array.   // start at 2nd char and go till the end of the array.   for (int i = 1; i < len; ++i) {      int j;     // for every char in outer loop check if that char is already seen.     // char in [0,tail) are all unique.     for (j = 0; j < tail; ++j) {       if (str[i] == str[j]) break; // break if we find duplicate.     }     // if j reachs tail..we did not break, which implies this char at pos i     // is not a duplicate. So we need to add it our "unique char list"     // we add it to the end, that is at pos tail.     if (j == tail) {       str[tail] = str[i]; // add       ++tail; // increment tail...[0,tail) is still "unique char list"     }   }   str[tail] = 0; // add a 0 at the end to mark the end of the unique char. } 
like image 76
codaddict Avatar answered Oct 02 '22 05:10

codaddict


Your code is, I'm sorry to say, very C-like.

A Java String is not a char[]. You say you want to remove duplicates from a String, but you take a char[] instead.

Is this char[] \0-terminated? Doesn't look like it because you take the whole .length of the array. But then your algorithm tries to \0-terminate a portion of the array. What happens if the arrays contains no duplicates?

Well, as it is written, your code actually throws an ArrayIndexOutOfBoundsException on the last line! There is no room for the \0 because all slots are used up!

You can add a check not to add \0 in this exceptional case, but then how are you planning to use this code anyway? Are you planning to have a strlen-like function to find the first \0 in the array? And what happens if there isn't any? (due to all-unique exceptional case above?).

What happens if the original String/char[] contains a \0? (which is perfectly legal in Java, by the way, see JLS 10.9 An Array of Characters is Not a String)

The result will be a mess, and all because you want to do everything C-like, and in place without any additional buffer. Are you sure you really need to do this? Why not work with String, indexOf, lastIndexOf, replace, and all the higher-level API of String? Is it provably too slow, or do you only suspect that it is?

"Premature optimization is the root of all evils". I'm sorry but if you can't even understand what the original code does, then figuring out how it will fit in the bigger (and messier) system will be a nightmare.


My minimal suggestion is to do the following:

  • Make the function takes and returns a String, i.e. public static String removeDuplicates(String in)
  • Internally, works with char[] str = in.toCharArray();
  • Replace the last line by return new String(str, 0, tail);

This does use additional buffers, but at least the interface to the rest of the system is much cleaner.


Alternatively, you can use StringBuilder as such:

static String removeDuplicates(String s) {     StringBuilder noDupes = new StringBuilder();     for (int i = 0; i < s.length(); i++) {         String si = s.substring(i, i + 1);         if (noDupes.indexOf(si) == -1) {             noDupes.append(si);         }     }     return noDupes.toString(); } 

Note that this is essentially the same algorithm as what you had, but much cleaner and without as many little corner cases, etc.

like image 42
polygenelubricants Avatar answered Oct 02 '22 06:10

polygenelubricants