Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

working with HUGE lists in python

how can I manage a huge list of 100+ million strings? How can i begin to work with such a huge list?

example large list:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"
like image 731
scott Avatar asked Mar 07 '13 23:03

scott


4 Answers

With lots and lots of memory. Python lists and strings are actually reasonably efficient, so provided you've got the memory, it shouldn't be an issue.

That said, if what you're storing are specifically poker hands, you can definitely come up with more compact representations. For example, you can use one byte to encode each card, which means you only need one 64 bit int to store an entire hand. You could then store these in a NumPy array, which would be significantly more efficient than a Python list.

For example:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]

And to speed up that last line a bit: hands[num] = map(cards_to_bytes.__getitem__, hand)

This will only require 7 * 133784560 = ~1gb of memory… And that could be cut down if you pack four cards into each byte (I don't know the syntax for doing that off the top of my head…)

like image 170
David Wolever Avatar answered Nov 01 '22 02:11

David Wolever


If you just want to loop over all possible hands to count them or to find one with a certain property, there is no need to store them all in memory.

You can just use the iterator and not convert to a list:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."

85900584 hand combinations in texas holdem poker.

like image 35
Junuxx Avatar answered Nov 01 '22 01:11

Junuxx


Another memory-less option which allow you to create a stream of data for processing however you like is to use generators. For example.

Print the total number of hands:

sum (1 for x in combinations(cards, 7))

Print the number of hands with the ace of clubs in it:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)
like image 36
Andrew Prock Avatar answered Nov 01 '22 01:11

Andrew Prock


There's often a trade-off between how long you spend coding and how long your code takes to run. If you're just trying to get something done quickly and don't expect it to run frequently, an approach like you're suggesting is fine. Just make the list huge -- if you don't have enough RAM, your system will churn virtual memory, but you'll probably get your answer faster than learning how to write a more sophisticated solution.

But if this is a system that you expect to be used on a regular basis, you should figure out something other than storing everything in RAM. An SQL database is probably what you want. They can be very complex, but because they are nearly ubiquitous there are plenty of excellent tutorials out there.

You might look to a well-documented framework like django which simplifies access to a database through an ORM layer.

like image 1
Leopd Avatar answered Nov 01 '22 03:11

Leopd