Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to map function directly over list of lists?

I have built a pixel classifier for images, and for each pixel in the image, I want to define to which pre-defined color cluster it belongs. It works, but at some 5 minutes per image, I think I am doing something unpythonic that can for sure be optimized.

How can we map the function directly over the list of lists?

#First I convert my image to a list
#Below list represents a true image size
list1=[[255, 114, 70],
[120, 89, 15],
[247, 190, 6],
[41, 38, 37],
[102, 102, 10],
[255,255,255]]*3583180

Then we define the clusters to map the colors to and the function to do so (which is taken from the PIL library)

#Define colors of interest
#Colors of interest 
RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
BLACK=[41, 38, 37]
GREY=[102, 102, 10]
WHITE=[255,255,255]

Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]

#Function to find closes cluster by root and squareroot distance of RGB
def distance(c1, c2):
    (r1,g1,b1) = c1
    (r2,g2,b2) = c2
    return math.sqrt((r1 - r2)**2 + (g1 - g2) ** 2 + (b1 - b2) **2)

What remains is to match every color, and make a new list with matched indexes from the original Colors:

Filt_lab=[]

#Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    closest_colors = sorted(Colors, key=lambda color: distance(color, pixel))
    closest_color = closest_colors[0]

    for num, clust in enumerate(Colors):
        if list(clust) == list(closest_color):
            Filt_lab.append(num)

Running a single image takes approximately 5 minutes, which is OK, but likely there is a method in which this time can be greatly reduced?

36%|███▌ | 7691707/21499080 [01:50<03:18, 69721.86it/s]

Expected outcome of Filt_lab:

[0, 1, 2, 4, 3, 5]*3583180
like image 335
Rivered Avatar asked Jul 23 '21 07:07

Rivered


People also ask

How do I map a list to another list in Python?

The map() function iterates over all elements in a list (or a tuple), applies a function to each and returns a new iterator of the new elements. In this syntax, fn is the name of the function that will call on each element of the list. In fact, you can pass any iterable to the map() function, not just a list or tuple.

What is map function in list?

map() function returns a map object(which is an iterator) of the results after applying the given function to each item of a given iterable (list, tuple etc.) Syntax : map(fun, iter) Parameters : fun : It is a function to which map passes each element of given iterable.

Is list map faster than list comprehension?

List comprehension is more concise and easier to read as compared to map. List comprehension are used when a list of results is required as map only returns a map object and does not return any list. Map is faster in case of calling an already defined function (as no lambda is required).

How do I map an object to a list?

Use the list() class to convert a map object to a list, e.g. new_list = list(map(my_fuc, my_list)) . The list class takes an iterable (such as a map object) as an argument and returns a list object. Copied! We passed a map object to the list() class to convert it to a list.

How do you map a function onto a list?

It is often desirable to map a function onto each individual element in a list. While listable functions do this by default, you can use Map to do this with non-listable functions. First set up a list of the integers from 1 to 5: Copy to clipboard.

Can map () be used to create a list in Python?

NOTE : You can pass one or more iterable to the map () function. NOTE : The returned value from map () (map object) then can be passed to functions like list () (to create a list), set () (to create a set) .

What is map () function in Python?

Python map() function applies another function on a given iterable (List/String/Dictionary, etc.) and returns map object. In simple words, it traverses the list, calls the function for each element, and returns the results. Python map object is also iterable holding the list of each iteration.

How do you map an object in Python?

Python Map () Function. The map () function takes at least two parameters. The first argument is a user-defined function, and then one or more iterable types. If you pass only one iterable, then map () calls the function for each of its elements and returns map object with results.


Video Answer


5 Answers

You can use the Numba's JIT to speed up the code by a large margin. The idea is to build classified_pixels on the fly by iterating over the colours for each pixel. The colours are stored in a Numpy array where the index is the colour key. The whole computation can run in parallel. This avoid many temporary arrays to be created and written/read in memory and a lot of memory to be allocated. Moreover, the data types can be adapted so that the resulting array is smaller in memory (so written/read faster). Here is the final script:

import numpy as np
import numba as nb

@nb.njit('int32[:,::1](int32[:,:,::1], int32[:,::1])', parallel=True)
def classify(image, colors):
    classified_pixels = np.empty((image.shape[0], image.shape[1]), dtype=np.int32)
    for i in nb.prange(image.shape[0]):
        for j in range(image.shape[1]):
            minId = -1
            minValue = 256*256 # The initial value is the maximum possible value
            ir, ig, ib = image[i, j]
            # Find the color index with the minimum difference
            for k in range(len(colors)):
                cr, cg, cb = colors[k]
                total = (ir-cr)**2 + (ig-cg)**2 + (ib-cb)**2
                if total < minValue:
                    minValue = total
                    minId = k
            classified_pixels[i, j] = minId
    return classified_pixels

# Representative image
np.random.seed(42)
imarray = np.random.rand(3650,2000,3) * 255
image = imarray.astype(np.int32)

# Colors of interest
RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# Build a Numpy array rather than a dict
colors = np.array([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE], dtype=np.int32)

# Actual classification
classified_pixels = classify(image, colors)

# Convert array to list
cl_pixel_list = classified_pixels.reshape(classified_pixels.shape[0] * classified_pixels.shape[1]).tolist()

# Print
print(cl_pixel_list[0:10])

This implementation takes about 0.19 second on my 6-core machine. It is about 15 times faster than the last provided answer so far and more than thousand times faster than the initial implementation. Note that about half the time is spent in tolist() since classify function is very fast.

like image 176
Jérôme Richard Avatar answered Oct 18 '22 03:10

Jérôme Richard


it seems your computer is so fast :)

this is your code's halfway output on my system:

  0%|          | 5635/21499080 [00:44<46:51:14, 127.43it/s]

but I have rewritten your code using TensorFlow, and now it's running for about 3 seconds :)

import math
import os
from time import time

import numpy as np
from tqdm import tqdm

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}
import tensorflow as tf

list1 = [[255, 114, 70],
         [120, 89, 15],
         [247, 190, 6],
         [41, 38, 37],
         [102, 102, 10],
         [255, 255, 255]] * 3583180
list1_np = tf.constant(list1)
RED = [255, 114, 70]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]
Colors = tf.constant([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE])
t = time()
ans = tf.argmin(np.array([tf.math.reduce_sum((list1_np - c) ** 2, axis=1) for c in Colors]), axis=0)
print(time() - t)
print(ans)


# and now your code

def distance(c1, c2):
    (r1, g1, b1) = c1
    (r2, g2, b2) = c2
    return math.sqrt((r1 - r2) ** 2 + (g1 - g2) ** 2 + (b1 - b2) ** 2)


t = time()
Filt_lab = []

# Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    closest_colors = sorted(Colors, key=lambda color: distance(color, pixel))
    closest_color = closest_colors[0]

    for num, clust in enumerate(Colors):
        if list(clust) == list(closest_color):
            Filt_lab.append(num)
print(time() - t)

the output is:

3.1714584827423096
tf.Tensor([0 1 2 ... 4 3 5], shape=(21499080,), dtype=int64)
  0%|          | 951/21499080 [00:07<47:36:50, 125.42it/s]

NOTE1: you can omit some of the imports if you delete the second part.

NOTE2: when you want to compare distances together, there is no need to use square root.

like image 20
Kasra Avatar answered Oct 18 '22 02:10

Kasra


Using numpy:

import numpy as np

#Representative image
imarray = np.uint64(np.random.rand(3583180*6,3) * 255)
#Or make with np.uint64(your_list_of_lists) if you already have that list lists; Axes: pixel, color_channels

RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
BLACK=[41, 38, 37]
GREY=[102, 102, 10]
WHITE=[255,255,255]
#your list of colors
Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]
#again converted to numpy
Colors_np = np.uint64(Colors) #axes: colors, color_channels

#Compute all distance, or rather the squares at that has no
#effect on which is minimal and we can drop the sqrt computation then
#Extend both numpy arrays to be haves axes [pixel, color, color_channels] with `np.newaxis`, 
#take the difference, 
#then the square, 
#and then the sum across color channels
distances = np.sum((imarray[:,np.newaxis, :] - Colors_np[np.newaxis, :, :])**2, 2)
#difference has axes [pixel, color, color_channels], summed over axes 2 => [pixel, color] axes remain
#You want index of minimum over color axis, so:
closest_color_indices = np.argmin(distances, 1)

#written as one line and timed with %timeit in ipython (on a single core):
#%timeit np.argmin(np.sum((imarray[:,np.newaxis, :] - Colors_np[np.newaxis, :, :])**2, 2), 1)
#6.11 s +- 79.4 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

So this takes about 6.11s for 3583180*6=21499080 pixels and 6 possible colors.

like image 37
Koen G. Avatar answered Oct 18 '22 02:10

Koen G.


Just quick speedups:

  1. You can omit math.sqrt()
  2. Create dictionary of colors instead of a list (that way you don't have to search for the index each iteration)
  3. use min() instead of sorted()
from tqdm import tqdm

list1 = [
    [255, 114, 70],
    [120, 89, 15], 
    [247, 190, 6],
    [41, 38, 37],
    [102, 102, 10],
    [255, 255, 255],
] * 3583180


RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# create a dictionary instead of a list:
Colors = {
    i: c
    for i, c in enumerate([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE])
}


# Function to find closes cluster by root and squareroot distance of RGB - EDIT: squareroot omitted 
def distance(c1, c2):
    (r1, g1, b1) = c1
    (r2, g2, b2) = c2
    return (r1 - r2) ** 2 + (g1 - g2) ** 2 + (b1 - b2) ** 2   # <-- you can ommit math.sqrt


Filt_lab = []

# Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    # use min() instead of sorted:
    closest_color = min(
        Colors, key=lambda color: distance(Colors[color], pixel)
    )
    Filt_lab.append(closest_color)

On my computer the speed went up from ~108000.0it/s to ~155000.00it/s.


Note: For this kind of tasks is better using numpy library.

like image 1
Andrej Kesely Avatar answered Oct 18 '22 02:10

Andrej Kesely


You can try creating and using a lookup table with 256 * 256 * 256 elements.

import numpy as np
from scipy.spatial import cKDTree
imarray = np.uint8(np.random.rand(3583180*6,3) * 255)

code=np.array([1, 256, 256*256])
RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
GREY=[102, 102, 10]
BLACK=[41, 38, 37]
WHITE=[255,255,255]
#your list of colors
Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]
#again converted to numpy
Colors_np = np.uint8(Colors) #axes: colors, color_channels

x=np.arange(256, dtype=np.uint8)
rgb=np.array(np.meshgrid(x, x, x)).T.reshape(-1,3)
rgb[:,[0, 1, 2]]=rgb[:,[1, 0, 2]] #swap columns
# rgb is table all colors

voronoi_kdtree = cKDTree(Colors_np) # Voronoi by base Colors

_, test_point_regions = voronoi_kdtree.query(rgb)
# test_point_regions is lookup table (LUT)

result=test_point_regions[np.dot(imarray, code)]

assert np.all(test_point_regions[np.dot(Colors_np, code)]==np.array([0, 1, 2, 3, 4, 5]))
like image 1
Alex Alex Avatar answered Oct 18 '22 04:10

Alex Alex