Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a Pythonic way to alter a dict with a key and multiple values to get the desired output?

Problem at hand:

I have the following list of tuples (ID, Country) that I will eventually store in a MySQL table.

mylist = [(10, 'Other'), (10, 'India'), (10, 'Unknown'), (11, 'Other'), (11, 'Unknown'), (12, 'USA'), (12, 'UK'), (12, 'Other')]

I want to treat the 'Other' and 'Unknown' using the following condition :

Value       Replaced by => This value
----------------------------------------
Other & Unknown         => Other
A country & Other       => Country
A country & Unknown     => Country

Python :

def refinelist(mylist):

    '''Updating the list to remove unwanted values'''
    '''
    Other & Unknown => Other
    A country & Other => Country
    A country & Unknown => Country
    '''

    if 'Other' in mylist and 'Unknown' in mylist:
        print 'remove unknown'
        mylist.remove('Unknown')
    if 'Other' in mylist and len(mylist) >= 2:
        print 'remove other'
        mylist.remove('Other')
    if 'Unknown' in mylist and len(mylist) >= 2:
        print 'remove unknown'
        mylist.remove('Unknown')

    return mylist

def main():

    mylist = [(10, 'Other'), (10, 'India'), (10, 'Unknown'), (11, 'Other'), (11, 'Unknown'), (12, 'USA'), (12, 'UK'), (12, 'Other')]

    d = {}

    for x,y in mylist:
        d.setdefault(x, []).append(y)

    # Clean the list values    
    for each in d:
        d[each] = refinelist(d[each])

    ## Convert dict to list of tuples for database entry

    outlist = []

    #result = [(key, value) for key,value in d.keys(), value in d.values()]  ## Couldn't get this to work. Can the below loop be written as list comprehension with minimal footprint?

    for key, value in d.items():
        if len(value) == 1:
            print key, value[0]
            outlist.append((key, value[0]))
        elif len(value) > 1:
            for eachval in value:
                print key, eachval
                outlist.append((key, eachval))

    print outlist

if __name__ == "__main__":
    main()    

Output :

remove unknown
remove other
remove unknown
remove other
10 India
11 Other
12 USA
12 UK
[(10, 'India'), (11, 'Other'), (12, 'USA'), (12, 'UK')]

Question :

I have a feeling this can be done more efficiently. Is using a dict overkill?

I start off with a list of tuples (luples), converting it to a dict, performing a clean operation, then converting it back to luples?

I could just insert the original luples in the MySQL table and then deal with 'Unknown' and 'Other' with few queries but I prefer Python for the task.

A pythonic solution or some critics on the code is greatly appreciated.

like image 723
ThinkCode Avatar asked Feb 20 '23 14:02

ThinkCode


2 Answers

Making extensive use of generators and list comprehension, you can write it like this:

other = ['Other', 'Unknown']                        # Strings denoting non-contries
ids = set(i for i,j in mylist)                      # All ids in the list
known = set(i for i,j in mylist if j not in other)  # Ids of real countries
outlist = [k for k in mylist if k[1] not in other]  # Keep all real countries
outlist.extend((i, other[0]) for i in ids - known)  # Append "Other" for all IDs with no real country

The result will be

[(10, 'India'), (12, 'USA'), (12, 'UK'), (11, 'Other')]

If order matters, this will mean more work.

like image 87
MvG Avatar answered Feb 22 '23 05:02

MvG


For one thing, your code results in a bunch of expensive list operations with each remove call. If order matters, you can do the following, just by sorting first and then going through the list just one more time. (I wrote this as a generator so that you (1) don't have to create a list if you don't need to (e.g. if you were going to add this right into the db) and (2) so that you avoid all the append operations.

def filter_list(lst):
    lst = sorted(lst)
    curr_id = lst[0][0]
    found_country = False
    for id, elem in lst:
        if id != curr_id:
            if not found_country:
                yield (curr_id, "Other")
            curr_id = id
            found_country=False
        if elem not in ("Other", "Unknown"):
            yield (curr_id, elem)
            found_country = True

Use list(filter_list(input_list)) if you just want to get a list back. (freely admit it's not the most elegant)

like image 32
Jeff Tratner Avatar answered Feb 22 '23 04:02

Jeff Tratner