So I have an array that holds several numbers. As my script runs, more and more numbers are appended to this array. However, I am not interested in all the numbers but just want to keep track of the last 5 numbers.
Currently, I just store all the numbers in the array. However, this array gets really big and it's full of unnecessary information.
I have thought about making a function that when it adds an element to the array, also removes the last element if the array already contains 5 numbers.
I also thought about making a new class to create a data structure that does what I want. However, I only need to reference this array occasionally and is only a small part of the script. So I think it is overkill if I create a whole new class to do this.
What is the best way to do this?
I fully agree with the idea of using Python's limited-length deque
if it's available, and if not, Michael Anderson's simple solution is quite adequate. (I upvoted both) But I just wanted to mention the third option of a ring buffer, which is often used for this kind of task when low memory footprint and high execution speed are important. (In other words, in situations when you probably wouldn't be using Python :-p) For example, the Linux kernel uses this structure to store log messages generated during the boot process, before the system logger starts.
A Python implementation could look like this:
class RingBuffer(object):
def __init__(self, n):
self._buf = [None] * n
self._index = 0
self._valid = 0
def add(self, obj):
n = len(self._buf)
self._buf[self._index] = obj
self._index += 1
if self._index == n
self._index = 0
if self._valid < n:
self._valid += 1
def __len__(self):
return self._valid
# could include other methods for accessing or modifying the contents
Basically what it does is preallocate an array (in Python, a list) of the desired length and fill it with dummy values. The buffer also contains an "index" which points to the next spot in the list that should be filled with a value. Each time a value is added, it's stored in that spot and the index is incremented. When the index reaches the length of the array, it's reset back to zero. Here's an example (I'm using 0
instead of None
for the dummy value just because it's quicker to type):
[0,0,0,0,0]
^
# add 1
[1,0,0,0,0]
^
# add 2
[1,2,0,0,0]
^
# add 3
[1,2,3,0,0]
^
# add 4
[1,2,3,4,0]
^
# add 5
[1,2,3,4,5]
^
# add 6
[6,2,3,4,5]
^
# add 7
[6,7,3,4,5]
^
and so on.
Try using a deque: http://docs.python.org/library/collections.html#deque-objects
"If maxlen is not specified or is None, deques may grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. Bounded length deques provide functionality similar to the tail filter in Unix. They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest."
The class can be pretty trivial:
class ListOfFive:
def __init__(self):
self.data = []
def add(self,val):
if len(self.data)==5:
self.data=self.data[1:]+[val]
else:
self.data+=[val]
l = ListOfFive()
for i in range(1,10):
l.add(i)
print l.data
Output is:
[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]
Another neat ring buffer implementation can be found in the ActiveState Recipes - your ring buffer object starts out as an instance of RingBuffer while filling up at first, and then your instance changes its class to RingBufferFull, an optimized full implementation. It always makes me smile.
class RingBuffer:
def __init__(self,size_max):
self.max = size_max
self.data = []
def append(self,x):
"""append an element at the end of the buffer"""
self.data.append(x)
if len(self.data) == self.max:
self.cur=0
self.__class__ = RingBufferFull
def get(self):
""" return a list of elements from the oldest to the newest"""
return self.data
class RingBufferFull:
def __init__(self,n):
raise "you should use RingBuffer"
def append(self,x):
self.data[self.cur]=x
self.cur=(self.cur+1) % self.max
def get(self):
return self.data[self.cur:]+self.data[:self.cur]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With