Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper Usage of list.append in Python

Tags:

python

Like to know why method 1 is correct and method 2 is wrong.

Method1:

def remove_duplicates(x):
    y = []
    for n in x:
        if n not in y:
            y.append(n)
    return y

Method 2:

def remove_duplicates(x):
    y = []
    for n in x:
        if n not in y:
            y = y.append(n)
    return y

I don't understand why the second method returns the wrong answer?

like image 920
lakshmen Avatar asked Dec 04 '22 10:12

lakshmen


1 Answers

The list.append method returns None. So y = y.append(n) sets y to None.

If this happens on the very last iteration of the for-loop, then None is returned.

If it happens before the last iteration, then on the next time through the loop,

if n not in y

will raise a

TypeError: argument of type 'NoneType' is not iterable

Note: In most cases there are faster ways to remove duplicates than Method 1, but how to do it depends on if you wish to preserve order, if the items are orderable, and if the items in x are hashable.

def unique_hashable(seq):
    # Not order preserving. Use this if the items in seq are hashable, 
    # and you don't care about preserving order.
    return list(set(seq))

def unique_hashable_order_preserving(seq): 
    # http://www.peterbe.com/plog/uniqifiers-benchmark (Dave Kirby)
    # Use this if the items in seq are hashable and you want to preserve the
    # order in which unique items in seq appear.
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

def unique_unhashable_orderable(seq):
    # Author: Tim Peters
    # http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
    # Use this if the items in seq are unhashable, but seq is sortable
    # (i.e. orderable). Note the result does not preserve order because of
    # the sort.
    # 
    # We can't hash all the elements.  Second fastest is to sort,
    # which brings the equal elements together; then duplicates are
    # easy to weed out in a single pass.
    # NOTE:  Python's list.sort() was designed to be efficient in the
    # presence of many duplicate elements.  This isn't true of all
    # sort functions in all languages or libraries, so this approach
    # is more effective in Python than it may be elsewhere.
    try:    
        t = list(seq)
        t.sort()
    except TypeError:
        del t
    else:
        last = t[0]
        lasti = i = 1
        while i < len(seq):
            if t[i] != last:
                t[lasti] = last = t[i]
                lasti += 1
            i += 1
    return t[:lasti]

def unique_unhashable_nonorderable(seq):
    # Use this (your Method1) if the items in seq are unhashable and unorderable.
    # This method is order preserving.
    u = []
    for x in seq:
        if x not in u:
            u.append(x)
    return u

And this may be the fastest if you have NumPy and the items in seq are orderable:

import numpy as np
def unique_order_preserving_numpy(seq):
    u, ind = np.unique(seq, return_index=True)
    return u[np.argsort(ind)] 
like image 97
unutbu Avatar answered Dec 17 '22 23:12

unutbu