Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a Pattern in a Grid Python [duplicate]

Tags:

python

I have randomly generated grid containing 0 and 1:

1 1 0 0 0 1 0 1
1 1 1 0 1 1 1 1
1 0 0 0 1 0 1 1
0 0 1 0 1 0 1 1
1 1 1 1 0 0 1 1
0 0 1 1 1 1 1 0
0 1 0 0 1 0 1 1

How can I iterate through the grid to find the largest cluster of 1s, that is equal or larger than 4 items (across row and column)?

I assume I need to keep a count of each found cluster while iterating and ones its more than 4 items, record and count in a list and then find the largest number.

The problem is that I cannot figure out how to do so across both rows and columns and record the count. I have iterated through the grid but not sure how to move further than two rows.

For example in the above example, the largest cluster is 8. There are some other clusters in the grid, but they have 4 elements:

A A 0 0 0 1 0 1
A A 1 0 1 1 1 1
1 0 0 0 1 0 1 1
0 0 1 0 1 0 1 1
1 1 B B 0 0 1 1
0 0 B B 1 1 1 0
0 1 0 0 1 0 1 1

The code I tried:

rectcount = []
for row in range(len(grid)):
    for num in range(len(grid[row])):

    # count = 0
        try:

            # if grid[row][num] == 1:
                # if grid[row][num] == grid[row][num + 1] == grid[row + 1][num] == grid[row + 1][num + 1]:
                    # count += 1

            if grid[row][num] == grid[row][num + 1]:
                if grid[row + 1][num] == grid[row][num + 1]:
                    count += 1

                # if grid[row][num] == grid[row][num + 1] and grid[row][num] == grid[row + 1][num]:
                    # count += 1
                else:
                    count = 0

            if grid[row][num] == grid[row + 1][num]:
                count += 1
        except:
            pass
like image 311
W Szum Avatar asked Oct 30 '20 05:10

W Szum


1 Answers

I've implemented three algorithms.

First algorithm is Simple, using easiest approach of nested loops, it has O(N^5) time complexity (where N is one side of input grid, 10 for our case), for our inputs of size 10x10 time of O(10^5) is quite alright. Algo id in code is algo = 0. If you just want to see this algorithm jump to line ------ Simple Algorithm inside code.

Second algorithm is Advanced, using Dynamic Programming approach, its complexity is O(N^3) which is much faster than first algorithm. Algo id in code is algo = 1. Jump to line ------- Advanced Algorithm inside code.

Third algorithm Simple-ListComp I implemented just for fun, it is almost same like Simple, same O(N^5) complexity, but using Python's list comprehensions instead of regular loops, that's why it is shorter, also a bit slower because doesn't use some optimizations. Algo id in code is algo = 2. Jump to line ------- Simple-ListComp Algorithm inside code to see algo.

The rest of code, besides algorithms, implements checking correctness of results (double-checking between algorithms), printing results, producing text inputs. Code is split into solving-task function solve() and testing function test(). solve() function has many arguments to allow configuring behavior of function.

All main code lines are documented by comments, read them to learn how to use code. Basically if s variable contains multi-line text with grid elements, same like in your question, you just run solve(s, text = True) and it will solve task and print results. Also you may choose algorithm out of two versions (0 (Simple) and 1 (Advanced) and 2 (Simple-ListComp)) by giving next arguments to solve function algo = 0, check = False (here 0 for algo 0). Look at test() function body to see simplest example of usage.

Algorithms output to console by default all clusters, from largest to smallest, largest is signified by . symbol, the rest by B, C, D, ..., Z symbols. You may set argument show_non_max = False in solve function if you want only first (largest) cluster to be shown.

I'll explain Simple algorithm:

  1. Basically what algorithm does - it searches through all possible angled 1s rectangles and stores info about maximal of them into ma 2D array. Top-left point of such rectangle is (i, j), top-right - (i, k), bottom-left - (l, j + angle_offset), bottom-right - (l, k + angle_offset), all 4 corners, that's why we have so many loops.
  2. In outer two i (row) , j (column) loops we iterate over whole grid, this (i, j) position will be top-left point of 1s rectangle, we need to iterate whole grid because all possible 1s rectangles may have top-left at any (row, col) point of whole grid. At start of j loop we check that grid at (i, j) position should always contain 1 because inside loops we search for all rectangle with 1s only.
  3. k loop iterates through all possible top-right positions (i, k) of 1s rectangle. We should break out of loop if (i, k) equals to 0 because there is no point to extend k further to right because such rectangle will always contain 0.
  4. In previous loops we fixed top-left and top-right corners of rectangle. Now we need to search for two bottom corners. For that we need to extend rectangle downwards at different angles till we reach first 0.
  5. off loop tries extending rectangle downwards at all possible angles (0 (straight vertical), +1 (45 degrees shifted to the right from top to bottom), -1 (-45 degrees)), off basically is such number that grid[y][x] is "above" (corresponds to by Y) grid[y + 1][x + off].
  6. l tries to extend rectangle downwards (in Y direction) at different angles off. It is extended till first 0 because it can't be extended further then (because each such rectangle will already contain 0).
  7. Inside l loop there is if grid[l][max(0, j + off * (l - i)) : min(k + 1 + off * (l - i), c)] != ones[:k - j + 1]: condition, basically this if is meant to check that last row of rectangle contains all 1 if not this if breaks out of loop. This condition compares two list slices for non-equality. Last row of rectangle spans from point (l, j + angle_offset) (expression max(0, j + off * (l - i)), max-limited to be 0 <= X) to point (l, k + angle_offset) (expression min(k + 1 + off * (l - i), c), min-limited to be X < c).
  8. Inside l loop there are other lines, ry, rx = l, k + off * (l - i) computes bottom-right point of rectangle (ry, rx) which is (l, k + angle_offset), this (ry, rx) position is used to store found maximum inside ma array, this array stores all maximal found rectangles, ma[ry][rx] contains info about rectangle that has bottom-right at point (ry, rx).
  9. rv = (l + 1 - i, k + 1 - j, off) line computes new possible candidate for ma[ry][rx] array entry, possible because ma[ry][rx] is updated only if new candidate has larger area of 1s. Here rv[0] value inside rv tuple contains height of such rectangle, rv[1] contains width of such rectangle (width equals to the length of bottom row of rectangle), rv[2] contains angle of such rectangle.
  10. Condition if rv[0] * rv[1] > ma[ry][rx][0] * ma[ry][rx][1]: and its body just checks if rv area is larger than current maximum inside array ma[ry][rx] and if it is larger then this array entry is updated (ma[ry][rx] = rv). I'll remind that ma[ry][rx] contains info (width, height, angle) about current found maximal-area rectangle that has bottom-right point at (ry, rx) and that has these width, height and angle.
  11. Done! After algorithm run array ma contains information about all maximal-area angled rectangles (clusters) of 1s so that all clusters can be restored and printed later to console. Largest of all such 1s-clusters is equal to some rv0 = ma[ry0][rx0], just iterate once through all elements of ma and find such point (ry0, rx0) so that ma[ry0][rx0][0] * ma[ry0][rx0][1] (area) is maximal. Then largest cluster will have bottom-right point (ry0, rx0), bottom-left point (ry0, rx0 - rv0[1] + 1), top-right point (ry0 - rv0[0] + 1, rx0 - rv0[2] * (rv0[0] - 1)), top-left point (ry0 - rv0[0] + 1, rx0 - rv0[1] + 1 - rv0[2] * (rv0[0] - 1)) (here rv0[2] * (rv0[0] - 1) is just angle offset, i.e. how much shifted is first row along X compared to last row of rectangle).

Try it online!

# ----------------- Main function solving task -----------------

def solve(
    grid, *,
    algo = 1, # Choose algorithm, 0 - Simple, 1 - Advanced, 2 - Simple-ListComp
    check = True, # If True run all algorithms and check that they produce same results, otherwise run just chosen algorithm without checking
    text = False, # If true then grid is a multi-line text (string) having grid elements separated by spaces
    print_ = True, # Print results to console
    show_non_max = True, # When printing if to show all clusters, not just largest, as B, C, D, E... (chars from "cchars")
    cchars = ['.'] + [chr(ii) for ii in range(ord('B'), ord('Z') + 1)], # Clusters-chars, these chars are used to show clusters from largest to smallest
    one = None, # Value of "one" inside grid array, e.g. if you have grid with chars then one may be equal to "1" string. Defaults to 1 (for non-text) or "1" (for text).
    offs = [0, +1, -1], # All offsets (angles) that need to be checked, "off" is such that grid[i + 1][j + off] corresponds to next row of grid[i][j]
    debug = False, # If True, extra debug info is printed
):
    # Preparing
    
    assert algo in [0, 1, 2], algo
    if text:
        grid = [l.strip().split() for l in grid.splitlines() if l.strip()]
    if one is None:
        one = 1 if not text else '1'
    r, c = len(grid), len(grid[0])
    sgrid = '\n'.join([''.join([str(grid[ii][jj]) for jj in range(c)]) for ii in range(r)])
    mas, ones = [], [one] * max(c, r)
    
    # ----------------- Simple Algorithm, O(N^5) Complexity -----------------
        
    if algo == 0 or check:
        ma = [[(0, 0, 0) for jj in range(c)] for ii in range(r)] # Array containing maximal answers, Lower-Right corners
        
        for i in range(r):
            for j in range(c):
                if grid[i][j] != one:
                    continue
                for k in range(j + 1, c): # Ensure at least 2 ones along X
                    if grid[i][k] != one:
                        break
                    for off in offs:
                        for l in range(i + 1, r): # Ensure at least 2 ones along Y
                            if grid[l][max(0, j + off * (l - i)) : min(k + 1 + off * (l - i), c)] != ones[:k - j + 1]:
                                l -= 1
                                break
                            ry, rx = l, k + off * (l - i)
                            rv = (l + 1 - i, k + 1 - j, off)
                            if rv[0] * rv[1] > ma[ry][rx][0] * ma[ry][rx][1]:
                                ma[ry][rx] = rv
                                
        mas.append(ma)
        ma = None
                    
    # ----------------- Advanced Algorithm using Dynamic Programming, O(N^3) Complexity -----------------

    if algo == 1 or check:
        ma = [[(0, 0, 0) for jj in range(c)] for ii in range(r)] # Array containing maximal answers, Lower-Right corners
        
        for off in offs:
            d = [[(0, 0, 0) for jj in range(c)] for ii in range(c)]
            for i in range(r):
                f, d_ = 0, [[(0, 0, 0) for jj in range(c)] for ii in range(c)]
                for j in range(c):
                    if grid[i][j] != one:
                        f = j + 1
                        continue
                    if f >= j:
                        # Check that we have at least 2 ones along X
                        continue
                    df = [(0, 0, 0) for ii in range(c)]
                    for k in range(j, -1, -1):
                        t0 = d[j - off][max(0, k - off)] if 0 <= j - off < c and k - off < c else (0, 0, 0)
                        if k >= f:
                            t1 = (t0[0] + 1, t0[1], off) if t0 != (0, 0, 0) else (0, 0, 0)
                            t2 = (1, j - k + 1, off)
                            t0 = t1 if t1[0] * t1[1] >= t2[0] * t2[1] else t2
                            
                            # Ensure that we have at least 2 ones along Y
                            t3 = t1 if t1[0] > 1 else (0, 0, 0)
                            if k < j and t3[0] * t3[1] < df[k + 1][0] * df[k + 1][1]:
                                t3 = df[k + 1]
                            df[k] = t3
                        else:
                            t0 = d_[j][k + 1]
                        if k < j and t0[0] * t0[1] < d_[j][k + 1][0] * d_[j][k + 1][1]:
                            t0 = d_[j][k + 1]
                        d_[j][k] = t0
                    if ma[i][j][0] * ma[i][j][1] < df[f][0] * df[f][1]:
                        ma[i][j] = df[f]
                d = d_
                
        mas.append(ma)
        ma = None
        
    # ----------------- Simple-ListComp Algorithm using List Comprehension, O(N^5) Complexity -----------------
        
    if algo == 2 or check:
        ma = [
            [
                max([(0, 0, 0)] + [
                    (h, w, off)
                    for h in range(2, i + 2)
                        for w in range(2, j + 2)
                            for off in offs
                    if all(
                        cr[
                            max(0, j + 1 - w - off * (h - 1 - icr)) :
                            max(0, j + 1 - off * (h - 1 - icr))
                        ] == ones[:w]
                        for icr, cr in enumerate(grid[max(0, i + 1 - h) : i + 1])
                    )
                ], key = lambda e: e[0] * e[1])
                for j in range(c)
            ]
            for i in range(r)
        ]
        mas.append(ma)
        ma = None
    
    # ----------------- Checking Correctness and Printing Results -----------------

    if check:
        # Check that we have same answers for all algorithms
        masx = [[[cma[ii][jj][0] * cma[ii][jj][1] for jj in range(c)] for ii in range(r)] for cma in mas]
        assert all([masx[0] == e for e in masx[1:]]), 'Maximums of algorithms differ!\n\n' + sgrid + '\n\n' + (
            '\n\n'.join(['\n'.join([' '.join([str(e1).rjust(2) for e1 in e0]) for e0 in cma]) for cma in masx])
        )

    ma = mas[0 if not check else algo]

    if print_:
        cchars = ['.'] + [chr(ii) for ii in range(ord('B'), ord('Z') + 1)] # These chars are used to show clusters from largest to smallest
        res = [[grid[ii][jj] for jj in range(c)] for ii in range(r)]
        mac = [[ma[ii][jj] for jj in range(c)] for ii in range(r)]
        processed = set()
        sid = 0
        for it in range(r * c):
            sma = sorted(
                [(mac[ii][jj] or (0, 0, 0)) + (ii, jj) for ii in range(r) for jj in range(c) if (ii, jj) not in processed],
                key = lambda e: e[0] * e[1], reverse = True
            )
            if len(sma) == 0 or sma[0][0] * sma[0][1] <= 0:
                break
            maxv = sma[0]
            if it == 0:
                maxvf = maxv
            processed.add((maxv[3], maxv[4]))
            show = True
            for trial in [True, False]:
                for i in range(maxv[3] - maxv[0] + 1, maxv[3] + 1):
                    for j in range(maxv[4] - maxv[1] + 1 - (maxv[3] - i) * maxv[2], maxv[4] + 1 - (maxv[3] - i) * maxv[2]):
                        if trial:
                            if mac[i][j] is None:
                                show = False
                                break
                        elif show:
                            res[i][j] = cchars[sid]
                            mac[i][j] = None
            if show:
                sid += 1
            if not show_non_max and it == 0:
                break
        res = '\n'.join([''.join([str(res[ii][jj]) for jj in range(c)]) for ii in range(r)])
        print(
            'Max:\nArea: ', maxvf[0] * maxvf[1], '\nSize Row,Col: ', (maxvf[0], maxvf[1]),
            '\nLowerRight Row,Col: ', (maxvf[3], maxvf[4]), '\nAngle: ', ("-1", " 0", "+1")[maxvf[2] + 1], '\n', sep = ''
        )
        print(res)
        if debug:
            # Print all computed maximums, for debug purposes
            for cma in [ma, mac]:
                print('\n' + '\n'.join([' '.join([f'({e0[0]}, {e0[1]}, {("-1", " 0", "+1")[e0[2] + 1]})' for e0_ in e for e0 in (e0_ or ('-', '-', 0),)]) for e in cma]))
        print(end = '-' * 28 + '\n')
    
    return ma

# ----------------- Testing -----------------

def test():
    # Iterating over text inputs or other ways of producing inputs
    for s in [
        """
        1 1 0 0 0 1 0 1
        1 1 1 0 1 1 1 1
        1 0 0 0 1 0 1 1
        0 0 1 0 1 0 1 1
        1 1 1 1 0 0 1 1
        0 0 1 1 1 1 1 0
        0 1 0 0 1 0 1 1
        """,
        """
        1 0 1 1 0 1 0 0
        0 1 1 0 1 0 0 1
        1 1 0 0 0 0 0 1
        0 1 1 1 0 1 0 1
        0 1 1 1 1 0 1 1
        1 1 0 0 0 1 0 0
        0 1 1 1 0 1 0 1
        """,
        """
        0 1 1 0 1 0 1 1
        0 0 1 1 0 0 0 1
        0 0 0 1 1 0 1 0
        1 1 0 0 1 1 1 0
        0 1 1 0 0 1 1 0
        0 0 1 0 1 0 1 1
        1 0 0 1 0 0 0 0
        0 1 1 0 1 1 0 0
        """
    ]:
        solve(s, text = True)

if __name__ == '__main__':
    test()

Output:

Max:
Area: 8
Size Row,Col: (4, 2)
LowerRight Row,Col: (4, 7)
Angle:  0

CC000101
CC1011..
100010..
001010..
1BBB00..
00BBBDD0
010010DD
----------------------------
Max:
Area: 6
Size Row,Col: (3, 2)
LowerRight Row,Col: (2, 1)
Angle: -1

10..0100
0..01001
..000001
0BBB0101
0BBB1011
CC000100
0CC10101
----------------------------
Max:
Area: 12
Size Row,Col: (6, 2)
LowerRight Row,Col: (5, 7)
Angle: +1

0..01011
00..0001
000..010
BB00..10
0BB00..0
001010..
10010000
01101100
----------------------------
like image 171
Arty Avatar answered Sep 28 '22 23:09

Arty