Maximum length of a list is platform dependent and depends upon address space and/or RAM. The maxsize constant defined in sys module returns 263-1 on 64 bit system. The largest positive integer supported by the platform's Py_ssize_t type, is the maximum size lists, strings, dicts, and many other containers can have.
To initialize a list in Python assign one with square brackets, initialize with the list() function, create an empty list with multiplication, or use a list comprehension. The most common way to declare a list in Python is to use square brackets.
To create a list of n placeholder elements, multiply the list of a single placeholder element with n . For example, use [None] * 5 to create a list [None, None, None, None, None] with five elements None . You can then overwrite some elements with index assignments.
You can create an empty list using an empty pair of square brackets [] or the type constructor list() , a built-in function that creates an empty list when no arguments are passed. Square brackets [] are commonly used in Python to create empty lists because it is faster and more concise.
def doAppend( size=10000 ):
result = []
for i in range(size):
message= "some unique object %d" % ( i, )
result.append(message)
return result
def doAllocate( size=10000 ):
result=size*[None]
for i in range(size):
message= "some unique object %d" % ( i, )
result[i]= message
return result
Results. (evaluate each function 144 times and average the duration)
simple append 0.0102
pre-allocate 0.0098
Conclusion. It barely matters.
Premature optimization is the root of all evil.
Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:
l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
# baz
l[i] = bar
# qux
Perhaps you could avoid the list by using a generator instead:
def my_things():
while foo:
#baz
yield bar
#qux
for thing in my_things():
# do something with thing
This way, the list isn't every stored all in memory at all, merely generated as needed.
Short version: use
pre_allocated_list = [None] * size
to preallocate a list (that is, to be able to address 'size' elements of the list instead of gradually forming the list by appending). This operation is very fast, even on big lists. Allocating new objects that will be later assigned to list elements will take much longer and will be the bottleneck in your program, performance-wise.
Long version:
I think that initialization time should be taken into account.
Since in Python everything is a reference, it doesn't matter whether you set each element into None or some string - either way it's only a reference. Though it will take longer if you want to create a new object for each element to reference.
For Python 3.2:
import time
import copy
def print_timing (func):
def wrapper (*arg):
t1 = time.time()
res = func (*arg)
t2 = time.time ()
print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
return res
return wrapper
@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod = copy.deepcopy, cpargs = (), use_num = False):
result = [None] * size
if init is not None:
if cp:
for i in range (size):
result[i] = init
else:
if use_num:
for i in range (size):
result[i] = cpmethod (i)
else:
for i in range (size):
result[i] = cpmethod (cpargs)
return result
@print_timing
def prealloc_array_by_appending (size):
result = []
for i in range (size):
result.append (None)
return result
@print_timing
def prealloc_array_by_extending (size):
result = []
none_list = [None]
for i in range (size):
result.extend (none_list)
return result
def main ():
n = 1000000
x = prealloc_array_by_appending(n)
y = prealloc_array_by_extending(n)
a = prealloc_array(n, None)
b = prealloc_array(n, "content", True)
c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
d = prealloc_array(n, "content", False, "some object {}".format, None, True)
e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
g = prealloc_array(n, "content", False, copy.deepcopy, [], False)
print ("x[5] = {}".format (x[5]))
print ("y[5] = {}".format (y[5]))
print ("a[5] = {}".format (a[5]))
print ("b[5] = {}".format (b[5]))
print ("c[5] = {}".format (c[5]))
print ("d[5] = {}".format (d[5]))
print ("e[5] = {}".format (e[5]))
print ("f[5] = {}".format (f[5]))
print ("g[5] = {}".format (g[5]))
if __name__ == '__main__':
main()
Evaluation:
prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()
As you can see, just making a big list of references to the same None object takes very little time.
Prepending or extending takes longer (I didn't average anything, but after running this a few times I can tell you that extending and appending take roughly the same time).
Allocating new object for each element - that is what takes the most time. And S.Lott's answer does that - formats a new string every time. Which is not strictly required - if you want to preallocate some space, just make a list of None, then assign data to list elements at will. Either way it takes more time to generate data than to append/extend a list, whether you generate it while creating the list, or after that. But if you want a sparsely-populated list, then starting with a list of None is definitely faster.
The Pythonic way for this is:
x = [None] * numElements
Or whatever default value you wish to prepopulate with, e.g.
bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche
(Caveat Emptor: The [Beer()] * 99
syntax creates one Beer
and then populates an array with 99 references to the same single instance)
Python's default approach can be pretty efficient, although that efficiency decays as you increase the number of elements.
Compare
import time
class Timer(object):
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
end = time.time()
secs = end - self.start
msecs = secs * 1000 # Millisecs
print('%fms' % msecs)
Elements = 100000
Iterations = 144
print('Elements: %d, Iterations: %d' % (Elements, Iterations))
def doAppend():
result = []
i = 0
while i < Elements:
result.append(i)
i += 1
def doAllocate():
result = [None] * Elements
i = 0
while i < Elements:
result[i] = i
i += 1
def doGenerator():
return list(i for i in range(Elements))
def test(name, fn):
print("%s: " % name, end="")
with Timer() as t:
x = 0
while x < Iterations:
fn()
x += 1
test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)
with
#include <vector>
typedef std::vector<unsigned int> Vec;
static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;
void doAppend()
{
Vec v;
for (unsigned int i = 0; i < Elements; ++i) {
v.push_back(i);
}
}
void doReserve()
{
Vec v;
v.reserve(Elements);
for (unsigned int i = 0; i < Elements; ++i) {
v.push_back(i);
}
}
void doAllocate()
{
Vec v;
v.resize(Elements);
for (unsigned int i = 0; i < Elements; ++i) {
v[i] = i;
}
}
#include <iostream>
#include <chrono>
using namespace std;
void test(const char* name, void(*fn)(void))
{
cout << name << ": ";
auto start = chrono::high_resolution_clock::now();
for (unsigned int i = 0; i < Iterations; ++i) {
fn();
}
auto end = chrono::high_resolution_clock::now();
auto elapsed = end - start;
cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}
int main()
{
cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';
test("doAppend", doAppend);
test("doReserve", doReserve);
test("doAllocate", doAllocate);
}
On my Windows 7 Core i7, 64-bit Python gives
Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms
While C++ gives (built with Microsoft Visual C++, 64-bit, optimizations enabled)
Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms
C++ debug build produces:
Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms
The point here is that with Python you can achieve a 7-8% performance improvement, and if you think you're writing a high-performance application (or if you're writing something that is used in a web service or something) then that isn't to be sniffed at, but you may need to rethink your choice of language.
Also, the Python code here isn't really Python code. Switching to truly Pythonesque code here gives better performance:
import time
class Timer(object):
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
end = time.time()
secs = end - self.start
msecs = secs * 1000 # millisecs
print('%fms' % msecs)
Elements = 100000
Iterations = 144
print('Elements: %d, Iterations: %d' % (Elements, Iterations))
def doAppend():
for x in range(Iterations):
result = []
for i in range(Elements):
result.append(i)
def doAllocate():
for x in range(Iterations):
result = [None] * Elements
for i in range(Elements):
result[i] = i
def doGenerator():
for x in range(Iterations):
result = list(i for i in range(Elements))
def test(name, fn):
print("%s: " % name, end="")
with Timer() as t:
fn()
test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)
Which gives
Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms
(in 32-bit, doGenerator does better than doAllocate).
Here the gap between doAppend and doAllocate is significantly larger.
Obviously, the differences here really only apply if you are doing this more than a handful of times or if you are doing this on a heavily loaded system where those numbers are going to get scaled out by orders of magnitude, or if you are dealing with considerably larger lists.
The point here: Do it the Pythonic way for the best performance.
But if you are worrying about general, high-level performance, Python is the wrong language. The most fundamental problem being that Python function calls has traditionally been up to 300x slower than other languages due to Python features like decorators, etc. (PythonSpeed/PerformanceTips, Data Aggregation).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With