Python cartesian product of n lists with n unknown at coding time

Tags:

python

Question

what is the best way to generate a cartesian product of some lists, not knowing in advance how many lists there are?

You can stop reading here if you like.

Background

I don't have money for school so I am trying to teach myself some programming using the Internet whilst working night shifts at a highway tollbooth. I have decided to try to solve some "programming challenge" problems as an exercise.

Programming assignment

Here's the problem I am trying to tackle, property of TopCoder:

http://community.topcoder.com/stat?c=problem_statement&pm=3496

I will not copy and paste the full description to respect their copyright notice but I am assuming I can summarise it, provided I don't use pieces of it verbatim (IANAL though).

Summary

If a "weighted sum" of historical stock prices is the sum of addenda obtained by multiplying a subset of these prices by an equal number of "weighting" factors, provided the latter add up to 1.0 and are chosen from the given set of valid values [-1.0, -0.9, ..., 0.9, 1.0], use this formula on all historical data supplied as an argument to your function, examining 5 prices at a time, predicting the next price and returning the permutation of "weighting factors" that yields the lowest average prediction error. There will be at least 6 stock prices in each run so at least one prediction is guaranteed, final results should be accurate within 1E-9.

Test data

Format:

One row for input data, in list format
One row for the expected result
One empty row as a spacer

Download from:

http://paste.ubuntu.com/1229857/

My solution

import itertools

# For a permutation of factors to be used in a weighted sum, it should be chosen
# such than the sum of all factors is 1.
WEIGHTED_SUM_TOTAL = 1.0
FACTORS_CAN_BE_USED_IN_WEIGHTED_SUM = lambda x: sum(x) == WEIGHTED_SUM_TOTAL

# Historical stock price data should be examined using a sliding window of width
# 5 when making predictions about the next price.
N_RECENT_PRICES = 5

# Valid values for weighting factors are: [-1.0, -0.9, ..., 0.9, 1.0]
VALID_WEIGHTS = [x / 10. for x in range(-10, 11)]

# A pre-calculated list of valid weightings to consider. This is the cartesiant
# product of the set of valid weigths considering only the combinations which
# are valid as components of a weighted sum.
CARTESIAN_PRODUCT_FACTORS = [VALID_WEIGHTS] * N_RECENT_PRICES
ALL_PERMUTATIONS_OF_WEIGHTS = itertools.product(*CARTESIAN_PRODUCT_FACTORS)
WEIGHTED_SUM_WEIGHTS = filter(FACTORS_CAN_BE_USED_IN_WEIGHTED_SUM,
                              ALL_PERMUTATIONS_OF_WEIGHTS)

# Generator function to get sliding windows of a given width from a data set
def sliding_windows(data, window_width):

  for i in range(len(data) - window_width):
    yield data[i:i + window_width], data[i + window_width]

def avg_error(data):

  # The supplied data will guarantee at least one iteration
  n_iterations = len(data) - 5

  best_average_error = None

  # Consider each valid weighting (e.g. permutation of weights)
  for weighting in WEIGHTED_SUM_WEIGHTS:

    # Keep track of the prediction errors for this weighting
    errors_for_this_weighting = []

    for historical_data, next_to_predict in sliding_windows(data,
                                                            N_RECENT_PRICES):

      prediction = sum([a * b for a, b in zip(weighting, historical_data)])
      errors_for_this_weighting.append(abs(next_to_predict - prediction))

    average_error = sum(errors_for_this_weighting) / n_iterations

    if average_error == 0: return average_error

    best_average_error = (average_error if not best_average_error else
      min(average_error, best_average_error))

  return best_average_error

def main():
  with open('data.txt') as input_file:
    while True:
        data = eval(input_file.readline())
        expected_result = eval(input_file.readline())
        spacer = input_file.readline()
        if not spacer:
          break
        result = avg_error(data)
        print expected_result, result, (expected_result - result) < 1e-9

if __name__ == '__main__':
    main()

My question

I am not asking for a code review of my solution because this would be the wrong StackExchange forum for that. I would post my solution to "Code Review" in that case.

My question instead is small, precise and unambiguous, fitting this site's format (hopefully).

In my code I generate a cartesian product of lists using itertools. In essence, I do not solve the crux of the problem myself but delegate the solution to a library that does that for me. I think this is the wrong approach to take if I want to learn from doing these exercises. I should be doing the hard part myself, otherwise why do the exercise at all? So I would like to ask you:

what is the best way to generate a cartesian product of some lists, not knowing in advance how many lists there are?

That's all I'd like to know, you can critique my code if you like. That's welcome, even if it passes all the tests (there is always a better way of doing things, especially if you are a beginner like me) but for this question to be "just right" for SO, I am focussing on just one aspect of the code, a concrete problem I have and something I am not happy with. Let me tell you more, I'll also share the canonical "what have you tried already"...

Clearly if I knew the number of lists I could just type in some nested for loops, like the top solvers of this exercise did in the competition. I tried writing a function that does this for an unknown number of lists but I was not sure which approach to take. The first approach was to write a recursive function. From list 1, take element 1 and combine it with element 1 of list 2, then element 1 of list 3, etc. I would push onto a stack the elements from each "layer" and pop them upon reaching the desired depth. I would imagine I would not fear a "stack overflow" because the depth reachable would be reasonable. I then struggled to choose a data structure to do this in the most efficient (memory/space) way possible, without passing too many parameters to the recursive calls. Should the data structure exist outside of the calls? Be passed around in the calls? Could I achieve any level of parallelism? How? With so many questions and so few answers I realised I needed to just know more to solve this problem and I could use a nudge in the right direction. You could provide a code snippet and I would study it. Or just explain to me what is the right "Computer Science" way of handling this type of problem. I am sure there is something I am not considering.

Finally, the thing that I did consider in my solution above is that thankfully filter filters a generator so the full cartesian product is never kept in memory (as it would if I did a list(ALL_PERMUTATIONS_OF_WEIGHTS) at any time in the code) so I am occupying space in memory only for those combinations which can actually be used as a weighted sum. A similar caution would be nice if applied to whatever system allows me to generate the cartesian product without using itertools.

720

asked Sep 30 '12 01:09

Robottinosino

1 Answers

Think of how numbers are written (in the decimal system, or in any other system). Include zero's even if you don't need them:

You can see how this looks like a cartesian product of 5 lists list(range(10)) (in this particular case). You can generate this output very easily by incrementing the "lowest" digit, and when it reaches the last one in the list, setting it to the first element and incrementing the "next highest" digit. You would still need for loops, of course, but a very small number. Use a similar approach when you work with arbitrary number of arbitrary lists.

For example, if you have 3 lists: ['a', 'b', 'c'], ['x', 'y'], ['1', '2'], you'll get:

ax1
ax2
ay1
ay2
bx1
bx2
by1
by2
cy1
cy2
cx1
cx2

Good luck!

EDIT:

If you like, here's a sample code to do this. I don't recursion just to show how simple this is. Recursion, of course, is also a great approach.

def lex_gen(bounds):
    elem = [0] * len(bounds)
    while True:
        yield elem
        i = 0
        while elem[i] == bounds[i] - 1:
            elem[i] = 0
            i += 1
            if i == len(bounds):
                raise StopIteration
        elem[i] += 1

def cart_product(lists):
    bounds = [len(lst) for lst in lists]
    for elem in lex_gen(bounds):
        yield [lists[i][elem[i]] for i in range(len(lists))]


for k in cart_product([['1', '2'], ['x', 'y'], ['a', 'b', 'c']]):
    print(k)

164

answered Nov 10 '22 21:11

max

Related questions
                            
                                Does Pyramid have a Signal/Slot system
                            
                                Where to store the configuration files of python applications on Windows
                            
                                Google Closure Linter not working in Sublime Text 2 for Windows
                            
                                Tornado doesn't restart cleanly in supervisor
                            
                                Matplotlib ion() and subprocesses
                            
                                Why is Python's "sorted()" slower than "copy, then .sort()"
                            
                                How do I normalize a hexbin plot?
                            
                                Scapy: how do I get the full IP packet header?
                            
                                Python, Sqlite not saving results on the file
                            
                                Python find x value to corresponding max y value in plot
                            
                                logging errors with flask
                            
                                How to get a random float with step in Python
                            
                                How to preserve form fields in django after unsuccessful submit?
                            
                                How to get rid of maximum recursion depth error while plotting interactively?
                            
                                pickle/zodb: how to handle moving .py files with class definitions?
                            
                                Python copy-on-write behavior
                            
                                A sorted and filtered treemodel in Python Gtk+3..?
                            
                                How do I improve scrapy's download speed?
                            
                                Python Error The _posixsubprocess module is not being used
                            
                                Django on Heroku dumpdata incomplete output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With