Best algorithm for delete duplicates in array of strings

Tags:

Today at school the teacher asked us to implement a duplicate-deletion algorithm. It's not that difficult, and everyone came up with the following solution (pseudocode):

for i from 1 to n - 1
    for j from i + 1 to n
        if v[i] == v[j] then remove(v, v[j])    // remove(from, what)
    next j
next i

The computational complexity for this algo is n(n-1)/2. (We're in high school, and we haven't talked about big-O, but it seems to be O(n^2)). This solution appears ugly and, of course, slow, so I tried to code something faster:

procedure binarySearch(vector, element, *position)
    // this procedure searches for element in vector, returning
    // true if found, false otherwise. *position will contain the
    // element's place (where it is or where it should be)
end procedure

----

// same type as v
vS = new array[n]

for i from 1 to n - 1
    if binarySearch(vS, v[i], &p) = true then
        remove(v, v[i])
    else
        add(vS, v[i], p)      // adds v[i] in position p of array vS
    end if
next i

This way vS will contain all the elements we've already passed. If element v[i] is in this array, then it is a duplicate and is removed. The computational complexity for the binary search is log(n) and for the main loop (second snippet) is n. Therefore the whole CC is n*log(n) if I'm not mistaken.

Then I had another idea about using a binary tree, but I can't put it down.
Basically my questions are:

Is my CC calculation right? (and, if not, why?)
Is there a faster method for this?

Thanks

551

asked May 20 '11 12:05

BlackBear

1 Answers

The easiest solution will be to simply sort the array (takes O(n log n) with standard implementation if you may use them. otherwise consider making an easy randomized quicksort (code is even on wikipedia)).

Afterwards scan it for one additional time. During that scan simple eliminate consecutive identical elements.

If you want to do it in O(n), you can also use a HashSet with elements you have already seen. Just iterate once over your array, for each element check if it is in your HashSet.

If it isn't in there, add it. If it is in there, remove it from the array.

Note, that this will take some additional memory and the hashing will have a constant factor that contributes to your runtime. Althought the time complexity is better, the practical runtime will only be onyl be faster once you exceed a certain array size

174

answered Sep 25 '22 02:09

b.buchhold

Related questions
                            
                                How to return a default boolean value in java streams if element not found?
                            
                                How to find maximum string length by column in data frame
                            
                                What does line.split(",")[1] mean [Java]?
                            
                                String: Why is indexOf significantely faster than contains?
                            
                                Capture contents inside curly braces
                            
                                Set nil string pointer to empty string
                            
                                Finding common elements between multiple dataframe columns
                            
                                Calculate correlation between columns of strings
                            
                                How do I determine the longest similar portion of several strings?
                            
                                Ruby: Write escaped string to YAML
                            
                                Smalltalk - Compare two strings for equality
                            
                                Turning a hash into a string of name-value pairs
                            
                                Can C's fgets be coaxed to work with a string *not* from a file?
                            
                                Objective C Convert int to NSString (iPhone)
                            
                                Convert string into two dimensional string array in Java
                            
                                python regex: match a string with only one instance of a character
                            
                                How can a StringBuilder best be converted to a String[]?
                            
                                How to remove periods in a string using jQuery
                            
                                How to convert a nice page title into a valid URL string?
                            
                                VB.NET: How to camel case words that are uppercased

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best algorithm for delete duplicates in array of strings

Tags:

string

algorithm

complexity-theory

big-o

duplicates

BlackBear

People also ask

1 Answers

b.buchhold

Recent Activity

Donate For Us