Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert missing dates while keeping the date order in python list

I have a list of lists containing [yyyy, value] items, with each sub list ordered by the increasing years. Here is a sample:

A = [
    [[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2013, 17]], 
    [[2008, 6], [2009, 3], [2011, 1], [2013, 6]], [[2013, 9]], 
    [[2008, 4], [2011, 1], [2013, 4]], 
    [[2010, 3], [2011, 3], [2013, 1]], 
    [[2008, 2], [2011, 4], [2013, 1]], 
    [[2009, 1], [2010, 1], [2011, 3], [2013, 3]], 
    [[2010, 1], [2011, 1], [2013, 5]], 
    [[2011, 1], [2013, 4]], 
    [[2009, 1], [2013, 4]], 
    [[2008, 1], [2013, 3]], 
    [[2009, 1], [2013, 2]], 
    [[2013, 2]], 
    [[2011, 1], [2013, 1]],
    [[2013, 1]], 
    [[2013, 1]], 
    [[2011, 1]], 
    [[2011, 1]]
    ]

What I need is to insert all the missing years between min(year) and max(year) and to make sure that the order is preserved. So, for example, taking the first sub-list of A:

[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2013, 17]

should look like:

[min_year, 0]...[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2012, 0],[2013, 17],..[max_year, 0]

Moreover, if any sublist contains only a single item then the same process should be applied to it so that the original value preserves its supposed order and rest of the min to max (year,value) items are inserted properly.

Any ideas?

Thanks.

like image 430
user2480542 Avatar asked Jul 25 '13 18:07

user2480542


3 Answers

minyear = 2008
maxyear = 2013
new_a = []
for group in A:
    group = group
    years = [point[0] for point in group]
    print years
    for year in range(minyear,maxyear+1):
        if year not in years:
            group.append([year,0])
    new_a.append(sorted(group))
print new_a

This produces:

[   [[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2012, 0], [2013, 17]],
    [[2008, 6], [2009, 3], [2010, 0], [2011, 1], [2012, 0], [2013, 6]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 9]],
    [[2008, 4], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 4]],
    [[2008, 0], [2009, 0], [2010, 3], [2011, 3], [2012, 0], [2013, 1]],
    [[2008, 2], [2009, 0], [2010, 0], [2011, 4], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 1], [2010, 1], [2011, 3], [2012, 0], [2013, 3]],
    [[2008, 0], [2009, 0], [2010, 1], [2011, 1], [2012, 0], [2013, 5]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 4]],
    [[2008, 0], [2009, 1], [2010, 0], [2011, 0], [2012, 0], [2013, 4]],
    [[2008, 1], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 3]],
    [[2008, 0], [2009, 1], [2010, 0], [2011, 0], [2012, 0], [2013, 2]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 2]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 0]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 0]]]
like image 60
tehsockz Avatar answered Oct 11 '22 05:10

tehsockz


Here you go, hope you like it!

min_year = 2007 # for testing purposes I used these years
max_year = 2014

final_list = [] # you're going to be adding to this list the corrected values

for outer in A: # start by iterating through each outer list in A
    active_years = {} # use this dictionary to keep track of which years are in each list and their values; sorry if you don't know about dictionaries

    for inner in outer: # now iterate through each year in each of the outer lists and create a dictionary entry for each (print to see what it's doing)
        active_years[inner[0]] = inner[1] # see who I'm creating a new key-value pair with the key as the year given by the 0th index of inner

    new_outer = [] # this will be your new outer list
    for year in range(min_year, max_year + 1): # now add to your active_years dictionary all the other years and give them value 0
        if year not in active_years.keys(): # only add the years not in your dictionary already
            active_years[year] = 0

    for entry in active_years.keys(): # we now iterate through each key, in order
        new_outer += [[entry, active_years[entry]]] # create your new outer list, watch carefully the brackets
    final_list += [new_outer] # add to the final_list

print final_list # presto
like image 33
ZekeDroid Avatar answered Oct 11 '22 05:10

ZekeDroid


How about:

import numpy as np

def np_fill(data,min_year,max_year):

    #Setup empty array
    year_range=np.arange(min_year,max_year+1)
    unit=np.dstack((year_range,np.zeros(max_year-min_year+1)))
    overall=np.tile(unit,(len(data),1,1)).astype(np.int)

    #Change the list to a list of ndarrays
    data=map(np.array,data)

    for num,line in enumerate(data):

        #Find correct indices and update overall array
        index=np.searchsorted(year_range,line[:,0])
        overall[num,index,1]=line[:,1]
    return overall

Run the code:

print np_fill(A,2008,2013)[:2]

[[[2008    5]
  [2009    5]
  [2010    2]
  [2011    5]
  [2012    0]
  [2013   17]]

 [[2008    6]
  [2009    3]
  [2010    0]
  [2011    1]
  [2012    0]
  [2013    6]]]


print np_fill(A,2008,2013).shape
(18, 6, 2)

You have a duplicate for year 2013 in the second line of A, not sure if this is purposeful or not.

A few timings because I was curious, the source code can be found here. Please let me know if you find an error.

For start year / end year- (2008,2013):

np_fill took 0.0454630851746 seconds.
tehsockz_fill took 0.00737619400024 seconds.
zeke_fill_fill took 0.0146050453186 seconds.

Kind of expecting this- it takes a lot of time to convert to numpy arrays. For break even it looks like the span of the years needs to be about 30:

For start year / end year- (1985,2013):

np_fill took 0.049400806427 seconds.
tehsockz_fill took 0.0425939559937 seconds.
zeke_fill_fill took 0.0748357772827 seconds.

Numpy of course does progressively better from there. If you need to return a numpy array for whatever reason, the numpy algorithm is always faster.

like image 45
Daniel Avatar answered Oct 11 '22 06:10

Daniel