Interview question: remove duplicates from an unsorted linked list

Tags:

I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I'm trying to solve the following question:

2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?

I'm solving it in C#, so I made my own Node class:

public class Node<T> where T : class
{
    public Node<T> Next { get; set; }
    public T Value { get; set; }

    public Node(T value)
    {
        Next = null;
        Value = value;
    }
}

My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):

public void RemoveDuplicates(Node<T> head)
{
    // Iterate through the list
    Node<T> iter = head;
    while(iter != null)
    {
        // Iterate to the remaining nodes in the list
        Node<T> current = iter;
        while(current!= null && current.Next != null)
        {
            if(iter.Value == current.Next.Value)
            {
                current.Next = current.Next.Next;
            }

            current = current.Next;
        }    

        iter = iter.Next;
    }
}

Here is the solution from the book (the author wrote it in java):

Without a buffer, we can iterate with two pointers: “current” does a normal iteration, while “runner” iterates through all prior nodes to check for dups. Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already.

public static void deleteDups2(LinkedListNode head) 
{
    if (head == null) return;

    LinkedListNode previous = head;
    LinkedListNode current = previous.next;

    while (current != null) 
    {
        LinkedListNode runner = head;

        while (runner != current) { // Check for earlier dups
            if (runner.data == current.data) 
            {
                LinkedListNode tmp = current.next; // remove current
                previous.next = tmp;
                current = tmp; // update current to next node
                break; // all other dups have already been removed
            }
            runner = runner.next;
        }
        if (runner == current) { // current not updated - update now
            previous = current;
            current = current.next;
        }
    }
}

So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node. I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position). But in general: is my answer nearly as good as the one in the book or is it significantly worse?

540

asked Dec 27 '10 23:12

Kiril

2 Answers

If you give a person a fish, they eat for a day. If you teach a person to fish...

My measures for the quality of an implementation are:

Correctness: If you aren't getting the right answer in all cases, then it isn't ready
Readability/maintainability: Look at code repetition, understandable names, the number of lines of code per block/method (and the number of things each block does), and how difficult it is to trace the flow of your code. Look at any number of books focused on refactoring, programming best-practices, coding standards, etc, if you want more information on this.
Theoretical performance (worst-case and ammortized): Big-O is a metric you can use. CPU and memory consumption should both be measured
Complexity: Estimate how it would take an average professional programmer to implement (if they already know the algorithm). See if that is in line with how difficult the problem actually is

As for your implementation:

Correctness: I suggest writing unit tests to determine this for yourself and/or debugging it (on paper) from start to finish with interesting sample/edge cases. Null, one item, two items, various numbers of duplicates, etc
Readability/maintainability: It looks mostly fine, though your last two comments don't add anything. It is a bit more obvious what your code does than the code in the book
Performance: I believe both are N-squared. Whether the amortized cost is lower on one or the other I'll let you figure out :)
Time to implement: An average professional should be able to code this algorithm in their sleep, so looking good

115

answered Sep 19 '22 06:09

Merlyn Morgan-Graham

There's not much of a difference. If I've done my math right your's is on average N/16 slower than the authors but pleanty of cases exist where your implementation will be faster.

Edit:

I'll call your implementation Y and the author's A

Both proposed solutions has O(N^2) as worst case and they both have a best case of O(N) when all elements are the same value.

EDIT: This is a complete rewrite. Inspired by the debat in the comments I tried to find the average case for random N random numbers. That is a sequence with a random size and a random distribution. What would the average case be.

Y will always run U times where U is the number of unique numbers. For each iteration it will do N-X comparisons where X is the number of elements removed prior to the iteration (+1). The first time no element will have been removed and on average on the second iteration N/U will have been removed.

That is on average ½N will been left to iterate. We can express the average cost as U*½N. The average U can be expressed based on N as well 0

Expressing A becomes more difficult. Let's say we use I iterations before we've encountered all unique values. After that will run between 1 and U comparisons (on average that's U/") and will do that N-I times.

I*c+U/2(N-I)

but whats the average number of comparisons (c) we run for the first I iterations. on average we need to compare against half of the elements already visited and on average we've visited I/2 elements, Ie. c=I/4

I/4+U/2(N-I).

I can be expressed in terms of N. On average we'll need to visited half on N to find the unique values so I=N/2 yielding an average of

(I^2)/4+U/2(N-I) which can be reduced to (3*N^2)/16.

That is of course if my estimation of the averages are correct. That is on average for any potential sequence A has N/16 fewer comparisons than Y but pleanty of cases exists where Y is faster than A. So I'd say they are equal when compared to the number of comparisons

answered Sep 18 '22 06:09

Rune FS

Related questions
                            
                                Concurrency violation: the UpdateCommand affected 0 of the expected 1 records
                            
                                WPF - Updating Label Content During Processing
                            
                                How to do correct polygon rotation? ( in C# though it applies to anything )
                            
                                Datetime format for a month and year
                            
                                Should Marshal.FreeHGlobal be placed in a finally block to ensure resources are disposed?
                            
                                Benefits of using enums over directly using integral types?
                            
                                C# - How to calculate ASN.1 DER encoding of a particular hash algorithm?
                            
                                How to prevent a WPF app from loading?
                            
                                How to Cast to Generic Parameter in C#?
                            
                                Cross thread access problem in ResponseCallback in Windows Phone 7
                            
                                How to format a Currency string to Integer?
                            
                                WPF: Remove the title/control box
                            
                                Start Process with administrator right in C#
                            
                                How to save Graphics object as image in C#?
                            
                                order of execution of try catch and finally block
                            
                                Delete an isolated storage file
                            
                                How to add nested Registry Key/Value in C#?
                            
                                Azure WorkerRole Stopping Immediately After Starting
                            
                                Long in Float, why?
                            
                                Difference between NetworkStream.Read() and NetworkStream.BeginRead()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Interview question: remove duplicates from an unsorted linked list

Tags:

java

c#

linked-list

Kiril

People also ask

2 Answers

Merlyn Morgan-Graham

Rune FS

Recent Activity

Donate For Us