Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over two nested 2D lists where list2 has list1's row numbers

Tags:

python

loops

list

I'm new to Python. So I want to get this done with loops without using some fancy stuff like generators. I have two 2D arrays, one integer array and the other string array like this:

  1. Integer 2D list:

    Here, dataset2d[0][0] is number of rows in the table, dataset[0][1] is number of columns. So the below 2D list has 6 rows and 4 columns

    dataset2d = [
        [6, 4],
        [0, 0, 0, 1],
        [1, 0, 2, 0],
        [2, 2, 0, 1],
        [1, 1, 1, 0],
        [0, 0, 1, 1],
        [1, 0, 2, 1]
    ]
    
  2. String 2D list:

    partition2d = [
        ['A', '1', '2', '4'],
        ['B', '3', '5'],
        ['C', '6']
    ]
    

    partition[*][0] i.e first column is a label. For group A, 1,2 and 4 are the row numbers that I need to pick up from dataset2d and apply a formula. So it means I will read 1, go to row 1 in dataset2d and read the first column value i.e dataset2d[1][0], then I will read 2 from partition2d, go to row 2 of dataset 2d and read the first column i.e dataset2d[2][0]. Similarly next one I'll read dataset2d[4][0].

    Then I will do some calculations, get a value and store it in a 2D list, then go to the next column in dataset2d for those rows. So in this example, next column values read would be dataset2d[1][1], dataset2d[2][1], dataset2d[4][1]. And again do some calculation and get one value for that column, store it. I'll do this until I reach the last column of dataset2d.

    The next row in partition2d is [B, 3, 5]. So I'll start with dataset2d[3][0], dataset2d[5][0]. Get a value for that column be a formula. Then real dataset2d [3][1], dataset2d[5][1] etc. until I reach last column. I do this until all rows in partition2d are read.

What I tried:

 for partitionRow in partition2d:
        for partitionCol in partitionRow:
                for colDataset in dataset2d:
                     print dataset2d[partitionCol][colDataset] 

What problem I'm facing:

  1. partition2d is a string array where I need to skip the first column which has characters like A,B,C.
  2. I want to iterate in dataset2d column wise only over the row numbers given in partition2d. So the colDataset should increment only after I'm done with that column.

Update1:

I'm reading the contents from a text file, and the data in 2D lists can vary, depending on file content and size, but the structure of file1 i.e dataset2d and file2 i.e partition2d will be the same.

Update2: Since Eric asked about how the output should look like.

 0.842322 0.94322 0.34232 0.900009    (For A)
 0.642322 0.44322 0.24232 0.800009    (For B)

This is just an example and the numbers are randomly typed by me. So the first number 0.842322 is the result of applying the formula to column 0 of dataset2d i.e dataset2d[parttionCol][0] for group A having considered rows 1,2,4.

The second number, 0.94322 is the result of applying formula to column 1 of dataset2d i.e dataset2d[partitionCol][1] for group A having considered rows 1,2 4.

The third number, 0.34232 is the result of applying formula to column 2 of dataset2d i.e dataset2d[partitionCol][2] for group A having considered rows 1,2 4. Similarly we get 0.900009.

The first number in second row, i.e 0.642322 is the result of applying the formula to column 0 of dataset2d i.e dataset2d[parttionCol][0] for group B having considered rows 3,5. And so on.

like image 845
user2441441 Avatar asked Feb 15 '23 05:02

user2441441


2 Answers

You can use Numpy (I hope this is not fancy for you):

import numpy
dataset2D = [ [6, 4], [0, 0, 0, 1], [1, 0, 2, 0], [2, 2, 0, 1], [1, 1, 1, 0], [0, 0, 1, 1], [1, 0, 2, 1] ]
dataset2D_size = dataset2D[0]
dataset2D = numpy.array(dataset2D)
partition2D = [ ['A', '1', '2', '4'], ['B', '3', '5'], ['C', '6'] ]

for partition in partition2D:
    label = partition[0]

    row_indices = [int(i) for i in partition[1:]]

    # Take the specified rows
    rows = dataset2D[row_indices]

    # Iterate the columns (this is the power of Python!)
    for column in zip(*rows):
        # Now, column will contain one column of data from specified row indices
        print column, # Apply your formula here
    print

or if you don't want to install Numpy, here is what you can do (this is what you want, actually):

dataset2D = [ [6, 4], [0, 0, 0, 1], [1, 0, 2, 0], [2, 2, 0, 1], [1, 1, 1, 0], [0, 0, 1, 1], [1, 0, 2, 1] ]
partition2D = [ ['A', '1', '2', '4'], ['B', '3', '5'], ['C', '6'] ]

dataset2D_size = dataset2D[0]

for partition in partition2D:
    label = partition[0]

    row_indices = [int(i) for i in partition[1:]]

    rows = [dataset2D[row_idx] for row_idx in row_indices]

    for column in zip(*rows):
        print column,
    print

both will print:

(0, 1, 1) (0, 0, 1) (0, 2, 1) (1, 0, 0)
(2, 0) (2, 0) (0, 1) (1, 1)
(1,) (0,) (2,) (1,)

Explanation of second code (without Numpy):

[dataset2D[row_idx] for row_idx in row_indices]

This is basically you take each row (dataset2D[row_idx]) and collate them together as a list. So the result of this expression is a list of lists (which comes from the specified row indices)

for column in zip(*rows):

Then zip(*rows) will iterate column-wise (the one you want). This works by taking the first element of each row, then combine them together to form a tuple. In each iteration, the result is stored in variable column.

Then inside the for column in zip(*rows): you already have your intended column-wise iterated elements from specified rows!

To apply your formula, just change the print column, into the stuff you wanna do. For example I modify the code to include row and column number:

print 'Processing partition %s' % label
for (col_num, column) in enumerate(zip(*rows)):
    print 'Column number: %d' % col_num
    for (row_num, element) in enumerate(column):
        print '[%d,%d]: %d' % (row_indices[row_num], col_num, element)

which will result in:

Processing partition A
Column number: 0
[1,0]: 0
[2,0]: 1
[4,0]: 1
Column number: 1
[1,1]: 0
[2,1]: 0
[4,1]: 1
Column number: 2
[1,2]: 0
[2,2]: 2
[4,2]: 1
Column number: 3
[1,3]: 1
[2,3]: 0
[4,3]: 0
Processing partition B
Column number: 0
[3,0]: 2
[5,0]: 0
Column number: 1
[3,1]: 2
[5,1]: 0
Column number: 2
[3,2]: 0
[5,2]: 1
Column number: 3
[3,3]: 1
[5,3]: 1
Processing partition C
Column number: 0
[6,0]: 1
Column number: 1
[6,1]: 0
Column number: 2
[6,3]: 2
Column number: 3
[6,3]: 1

I hope this helps.

like image 161
justhalf Avatar answered Apr 27 '23 08:04

justhalf


Here's an extensible solution using an iterator:

def partitions(data, p):
    for partition in p:
        label = partition[0]
        row_indices = partition[1:]
        rows = [dataset2D[row_idx] for row_idx in row_indices]
        columns = zip(*rows)

        yield label, columns

for label, columns in partitions(dataset2D, partitions2d):
    print "Processing", label
    for column in columns:
        print column
like image 27
Eric Avatar answered Apr 27 '23 08:04

Eric