In this Tensorflow detection model zoo they have mentioned COCO mAp score to different detection architectures. They also has said higher the mAp score higher the accuracy . What is don't understand is how this is calculated ? What is the maximum score it can has ? Why this mAp score is different from data set to data set ?
To understand MAP (Mean Average Precision), I would start with AP (Average Precision) first.
Suppose we are searching for images of a flower and we provide our image retrieval system a sample picture of a rose (query), we do get back a bunch of ranked images (from most likely to least likely). Usually not all of them are correct. So we compute the precision at every correctly returned image, and then take an average.
Example:
If our returned result is
1, 0, 0, 1, 1, 1
where1
is an image of a flower, while0
not, then the precision at every correct point is:
Precision at each correct image = 1/1, 0, 0, 2/4, 3/5, 4/6
Summation of these precisions = 83/30
Average Precision = (Precision summation)/(total correct images) = 83/120
Side note:
This section provides a detailed explanation behind the calculation of precision at each correct image in case you're still confused by the above fractions.
For illustration purposes, let 1, 0, 0, 1, 1, 1
be stored in an array so results[0] = 1
, results[1] = 0
etc.
Let totalCorrectImages = 0, totalImagesSeen = 0, pointPrecision = 0
The formula for pointPrecision
is totalCorrectImages / totalImagesSeen
At results[0], totalCorrectImages = 1, totalImagesSeen = 1
hence pointPrecision = 1
Since results[1] != 1
, we ignore it but totalImagesSeen = 2 && totalCorrectImages = 1
Since results[2] != 1
, totalImagesSeen = 3 && totalCorrectImages = 1
At results[3], totalCorrectImages = 2, totalImagesSeen = 4
hence pointPrecision = 2/4
At results[4], totalCorrectImages = 3, totalImagesSeen = 5
hence pointPrecision = 3/5
At results[5], totalCorrectImages = 4, totalImagesSeen = 6
hence pointPrecision = 4/6
A simple way to interpret is to produce a combination of zeros and ones which will give the required AP. For example, an AP of 0.5 could have results like
0, 1, 0, 1, 0, 1, ...
where every second image is correct, while an AP of0.333
has0, 0, 1, 0, 0, 1, 0, 0, 1, ...
where every third image is correct.For an AP of
0.1
, every 10th image will be correct, and that is definitely a bad retrieval system. On the other hand, for an AP above0.5
, we will encounter more correct images than wrong in the top results which is definitely a good sign.
MAP is just an extension of AP. You simply take the averages of all the AP scores for a certain number of queries. The above interpretation of AP scores also holds true for MAP. MAP ranges from 0 to 100, higher is better.
AP formula on Wikipedia
MAP formula on Wikipedia
Credits to this blog
EDIT I:
The same concept is applied when it comes to object detection. In this scenario you would calculate the AP for each class. This is given by the area under the precision-recall curve for a given class. From this point, you find their averages to attain the mAP.
For more details, refer to section 3.4.1 and 4.4 of the 2012 Pascal VOC Dev Kit. The related paper can be found here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With