Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define a large list array in Python using For loop or Vectorization?

Tags:

python

arrays

I have a 2D list of 416 rows, each row having 4 columns. Rows 1-4 contain their row number 4 times (i.e., [...[1,1,1,1],[2,2,2,2]...]. Row 330 contains [41,22,13,13]. Everything else is [0,0,0,0]. Currently I am using a for loop with many explicit if statements.

myList = [[0,0,0,0]]
for i in range(1, 416):
    if i == 1 or i == 2 or i == 3 or i == 4:
        myList.append([i,i,i,i])
    elif i == 330:
        myList.append([41,22,13,13])
    else:
        myList.append([0,0,0,0])

What is a more efficient way for me to define this array?

The other questions I have seen on SO don't seem to explicitly address this question, but if someone finds one that can be considered a duplicate please mark this question as such and I will accept it.

like image 953
SeeDerekEngineer Avatar asked Jul 12 '17 14:07

SeeDerekEngineer


People also ask

How do I make a list loop in python?

You can use a for loop to create a list of elements in three steps: Instantiate an empty list. Loop over an iterable or range of elements. Append each element to the end of the list.

Why do we use vectorization in python?

Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time of code efficiently.

Why is NumPy faster than for loop?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


2 Answers

Since the bulk of the list is a sublist of zeros (in array term, sparse), I'll just preallocate and then use slice assignment/indexing to update:

my_list = [[0, 0, 0, 0] for _ in range(416)]
my_list[330] = [41,22,13,13]
my_list[1:5] = [[i]*4 for i in range(1, 5)]

This avoids repeated branch evaluation for a host of false cases and the repeated appends.

OTOH, having zeros all over your memory can be avoided if you actually keep the data structure as a sparse matrix. You could have a look at the SciPy 2-D sparse matrix package.

like image 84
Moses Koledoye Avatar answered Oct 06 '22 00:10

Moses Koledoye


Off the top of my head, a really simple and fast way to do it:

import itertools

answer = list(itertools.chain([[i,i,i,i] for i in range(1,5)], [[0,0,0,0] for _ in range(330-6)], [[41,22,13,13]], [[0,0,0,0] for _ in range(416-330)]))
like image 32
inspectorG4dget Avatar answered Oct 06 '22 00:10

inspectorG4dget