I am looking for an Earth Mover's distance(or Fast EMD) implementation in python. Any clues on where to find it, I have looked enough on the web. I want to use it in an image retrieval project that I am doing. Thanks.
EDIT: I found a very nice solution using the pulp libararies. This page also has the instruction required to set it up.
The Earth Mover's Distance is Wasserstein with p = 1, usually denoted as W1 or 1-Wasserstein. The 2-Wasserstein metric is computed like 1-Wasserstein, except instead of summing the work values, you sum the squared work values and then take the square root.
Abstract Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution.
The amount of work done is the amount of dirt moved (the “flow”) times the distance moved. For example, the work required to move the 2 units of dirt in d1 a distance of 1 to hole h1 is 2 * 1 = 2. The EMD is the total amount of work divided by the total flow (amount of dirt).
Also, the Wasserstein metric does not require both measures to be on the same probability space, whereas KL divergence requires both measures to be defined on the same probability space.
here is the python code for calculating EARTH MOVERS DISTANCE between two 1D distributions of equal length
def emd (a,b):
earth = 0
earth1 = 0
diff = 0
s= len(a)
su = []
diff_array = []
for i in range (0,s):
diff = a[i]-b[i]
diff_array.append(diff)
diff = 0
for j in range (0,s):
earth = (earth + diff_array[j])
earth1= abs(earth)
su.append(earth1)
emd_output = sum(su)/(s-1)
print(emd_output)
There is an excellent implementation in OpenCv for Python. The name of the function is CalcEMD2 and a simple code to compare histograms of two images would look like this:
#Import OpenCv library
from cv2 import *
### HISTOGRAM FUNCTION #########################################################
def calcHistogram(src):
# Convert to HSV
hsv = cv.CreateImage(cv.GetSize(src), 8, 3)
cv.CvtColor(src, hsv, cv.CV_BGR2HSV)
# Extract the H and S planes
size = cv.GetSize(src)
h_plane = cv.CreateMat(size[1], size[0], cv.CV_8UC1)
s_plane = cv.CreateMat(size[1], size[0], cv.CV_8UC1)
cv.Split(hsv, h_plane, s_plane, None, None)
planes = [h_plane, s_plane]
#Define numer of bins
h_bins = 30
s_bins = 32
#Define histogram size
hist_size = [h_bins, s_bins]
# hue varies from 0 (~0 deg red) to 180 (~360 deg red again */
h_ranges = [0, 180]
# saturation varies from 0 (black-gray-white) to 255 (pure spectrum color)
s_ranges = [0, 255]
ranges = [h_ranges, s_ranges]
#Create histogram
hist = cv.CreateHist([h_bins, s_bins], cv.CV_HIST_ARRAY, ranges, 1)
#Calc histogram
cv.CalcHist([cv.GetImage(i) for i in planes], hist)
cv.NormalizeHist(hist, 1.0)
#Return histogram
return hist
### EARTH MOVERS ############################################################
def calcEM(hist1,hist2,h_bins,s_bins):
#Define number of rows
numRows = h_bins*s_bins
sig1 = cv.CreateMat(numRows, 3, cv.CV_32FC1)
sig2 = cv.CreateMat(numRows, 3, cv.CV_32FC1)
for h in range(h_bins):
for s in range(s_bins):
bin_val = cv.QueryHistValue_2D(hist1, h, s)
cv.Set2D(sig1, h*s_bins+s, 0, cv.Scalar(bin_val))
cv.Set2D(sig1, h*s_bins+s, 1, cv.Scalar(h))
cv.Set2D(sig1, h*s_bins+s, 2, cv.Scalar(s))
bin_val = cv.QueryHistValue_2D(hist2, h, s)
cv.Set2D(sig2, h*s_bins+s, 0, cv.Scalar(bin_val))
cv.Set2D(sig2, h*s_bins+s, 1, cv.Scalar(h))
cv.Set2D(sig2, h*s_bins+s, 2, cv.Scalar(s))
#This is the important line were the OpenCV EM algorithm is called
return cv.CalcEMD2(sig1,sig2,cv.CV_DIST_L2)
### MAIN ########################################################################
if __name__=="__main__":
#Load image 1
src1 = cv.LoadImage("image1.jpg")
#Load image 1
src2 = cv.LoadImage("image2.jpg")
# Get histograms
histSrc1= calcHistogram(src1)
histSrc2= calcHistogram(src2)
# Compare histograms using earth mover's
histComp = calcEM(histSrc1,histSrc2,30,32)
#Print solution
print(histComp)
I tested a code very similar to the previous code with Python 2.7 and Python(x,y). If you want to learn more about Earth Mover's and you want to see an implementation using OpenCV and C++, you can read "Chapter 7: Histograms an Matching" of the book "Learning OpenCV" by Gary Bradski & Adrain Kaebler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With