Problem at hand:
I have the following list of tuples (ID, Country) that I will eventually store in a MySQL table.
mylist = [(10, 'Other'), (10, 'India'), (10, 'Unknown'), (11, 'Other'), (11, 'Unknown'), (12, 'USA'), (12, 'UK'), (12, 'Other')]
I want to treat the 'Other' and 'Unknown' using the following condition :
Value Replaced by => This value
----------------------------------------
Other & Unknown => Other
A country & Other => Country
A country & Unknown => Country
Python :
def refinelist(mylist):
'''Updating the list to remove unwanted values'''
'''
Other & Unknown => Other
A country & Other => Country
A country & Unknown => Country
'''
if 'Other' in mylist and 'Unknown' in mylist:
print 'remove unknown'
mylist.remove('Unknown')
if 'Other' in mylist and len(mylist) >= 2:
print 'remove other'
mylist.remove('Other')
if 'Unknown' in mylist and len(mylist) >= 2:
print 'remove unknown'
mylist.remove('Unknown')
return mylist
def main():
mylist = [(10, 'Other'), (10, 'India'), (10, 'Unknown'), (11, 'Other'), (11, 'Unknown'), (12, 'USA'), (12, 'UK'), (12, 'Other')]
d = {}
for x,y in mylist:
d.setdefault(x, []).append(y)
# Clean the list values
for each in d:
d[each] = refinelist(d[each])
## Convert dict to list of tuples for database entry
outlist = []
#result = [(key, value) for key,value in d.keys(), value in d.values()] ## Couldn't get this to work. Can the below loop be written as list comprehension with minimal footprint?
for key, value in d.items():
if len(value) == 1:
print key, value[0]
outlist.append((key, value[0]))
elif len(value) > 1:
for eachval in value:
print key, eachval
outlist.append((key, eachval))
print outlist
if __name__ == "__main__":
main()
Output :
remove unknown
remove other
remove unknown
remove other
10 India
11 Other
12 USA
12 UK
[(10, 'India'), (11, 'Other'), (12, 'USA'), (12, 'UK')]
Question :
I have a feeling this can be done more efficiently. Is using a dict overkill?
I start off with a list of tuples (luples), converting it to a dict, performing a clean operation, then converting it back to luples?
I could just insert the original luples in the MySQL table and then deal with 'Unknown' and 'Other' with few queries but I prefer Python for the task.
A pythonic solution or some critics on the code is greatly appreciated.
Making extensive use of generators and list comprehension, you can write it like this:
other = ['Other', 'Unknown'] # Strings denoting non-contries
ids = set(i for i,j in mylist) # All ids in the list
known = set(i for i,j in mylist if j not in other) # Ids of real countries
outlist = [k for k in mylist if k[1] not in other] # Keep all real countries
outlist.extend((i, other[0]) for i in ids - known) # Append "Other" for all IDs with no real country
The result will be
[(10, 'India'), (12, 'USA'), (12, 'UK'), (11, 'Other')]
If order matters, this will mean more work.
For one thing, your code results in a bunch of expensive list operations with each remove call. If order matters, you can do the following, just by sorting first and then going through the list just one more time. (I wrote this as a generator so that you (1) don't have to create a list if you don't need to (e.g. if you were going to add this right into the db) and (2) so that you avoid all the append operations.
def filter_list(lst):
lst = sorted(lst)
curr_id = lst[0][0]
found_country = False
for id, elem in lst:
if id != curr_id:
if not found_country:
yield (curr_id, "Other")
curr_id = id
found_country=False
if elem not in ("Other", "Unknown"):
yield (curr_id, elem)
found_country = True
Use list(filter_list(input_list)) if you just want to get a list back. (freely admit it's not the most elegant)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With