Ground-truth data collection and evaluation for computer vision

Tags:

Currently I am starting to develop a computer vision application that involves tracking of humans. I want to build ground-truth metadata for videos that will be recorded in this project. The metadata will probably need to be hand labeled and will mainly consist of location of the humans in the image. I would like to use the metadata to evaluate the performance of my algorithms.

I could of course build a labeling tool using, e.g. qt and/or opencv, but I was wondering if perhaps there was some kind of defacto standard for this. I came across Viper but it seems dead and it doesn't quite work as easy as I would have hoped. Other than that, I haven't found much.

Does anybody here have some recommendations as to which software / standard / method to use both for the labeling as well as the evaluation? My main preference is to go for something c++ oriented, but this is not a hard constraint.

Kind regards and thanks in advance! Tom

452

asked May 16 '12 12:05

Goosebumps

1 Answers

I've had another look at vatic and got it to work. It is an online video annotation tool meant for crowd sourcing via a commercial service and it runs on Linux. However, there is also an offline mode. In this mode the service used for the exploitation of this software is not required and the software runs stand alone.

The installation is quite elaborately described in the enclosed README file. It involves, amongst others, setting up an appache and a mysql server, some python packages, ffmpeg. It is not that difficult if you follow the README. (I mentioned that I had some issues with my proxy but this was not related to this software package).

You can try the online demo. The default output is like this:

0 302 113 319 183 0 1 0 0 "person"
0 300 112 318 182 1 1 0 1 "person"
0 298 111 318 182 2 1 0 1 "person"
0 296 110 318 181 3 1 0 1 "person"
0 294 110 318 181 4 1 0 1 "person"
0 292 109 318 180 5 1 0 1 "person"
0 290 108 318 180 6 1 0 1 "person"
0 288 108 318 179 7 1 0 1 "person"
0 286 107 317 179 8 1 0 1 "person"
0 284 106 317 178 9 1 0 1 "person"

Each line contains 10+ columns, separated by spaces. The definition of these columns are:

1   Track ID. All rows with the same ID belong to the same path.
2   xmin. The top left x-coordinate of the bounding box.
3   ymin. The top left y-coordinate of the bounding box.
4   xmax. The bottom right x-coordinate of the bounding box.
5   ymax. The bottom right y-coordinate of the bounding box.
6   frame. The frame that this annotation represents.
7   lost. If 1, the annotation is outside of the view screen.
8   occluded. If 1, the annotation is occluded.
9   generated. If 1, the annotation was automatically interpolated.
10  label. The label for this annotation, enclosed in quotation marks.
11+ attributes. Each column after this is an attribute.

But can also provide output in xml, json, pickle, labelme and pascal voc

So, all in all, this does quite what I wanted and it is also rather easy to use. I am still interested in other options though!

103

answered Oct 13 '22 06:10

Goosebumps

Related questions
                            
                                Compute coordinates from source images after stitching
                            
                                Iris detection with opencv
                            
                                How do I return the most similar Unicode character to a section of an image?
                            
                                How do I append metadata to an image in Matlab?
                            
                                What exactly is the need for gamma correction?
                            
                                Ideal number of HoG features
                            
                                contrib module missing in opencv 3.0?
                            
                                To imread Parula image in Matlab without losing resolution
                            
                                Size mismatch for fc.bias and fc.weight in PyTorch
                            
                                Sobel Edge Detection in Android
                            
                                Open-source image processing library that supports high level 3D algorithms? [closed]
                            
                                MATLAB: how do I crop out a circle from an image
                            
                                Image processing/color detection in R: what library should I use?
                            
                                Find distorted rectangle in image (OpenCV)
                            
                                compare two images in android
                            
                                opencv/javacv: How to iterate over contours for shape identification?
                            
                                adjust corners and crop the image openCV
                            
                                Using .Net to deskew an image
                            
                                How to convert an image from CV_8UC1 to CV_32FC1 type in opencv?
                            
                                Warp Image area on touch of a point area?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ground-truth data collection and evaluation for computer vision

Tags:

image-processing

metadata

computer-vision

evaluation

tracking

Goosebumps

People also ask

1 Answers

Goosebumps

Recent Activity

Donate For Us