Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I download a specific part of Coco Dataset?

I am developing an object detection model to detect ships using YOLO. I want to use the COCO dataset. Is there a way to download only the images that have ships with the annotations?

like image 283
Shobhit Kumar Avatar asked Jun 29 '18 10:06

Shobhit Kumar


People also ask

Can COCO dataset be used for commercial?

Yes, the MS COCO images dataset is licensed under a Creative Commons Attribution 4.0 License. Accordingly, this license lets you distribute, remix, tweak, and build upon your work, even commercially, as long as you credit the original creator.


3 Answers

To download images from a specific category, you can use the COCO API. Here's a demo notebook going through this and other usages. The overall process is as follows:

  • Install pycocotools
  • Download one of the annotations jsons from the COCO dataset

Now here's an example on how we could download a subset of the images containing a person and saving it in a local file:

from pycocotools.coco import COCO
import requests

# instantiate COCO specifying the annotations json path
coco = COCO('...path_to_annotations/instances_train2014.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)

Which returns a list of dictionaries with basic information on the images and its url. We can now use requests to GET the images and write them into a local folder:

# Save the images into a local folder
for im in images:
    img_data = requests.get(im['coco_url']).content
    with open('...path_saved_ims/coco_person/' + im['file_name'], 'wb') as handler:
        handler.write(img_data)

Note that this will save all images from the specified category. So you might want to slice the images list to the first n.

like image 194
yatu Avatar answered Oct 17 '22 13:10

yatu


From what I personally know, if you're talking about the COCO dataset only, I don't think they have a category for "ships". The closest category they have is "boat". Here's the link to check the available categories: http://cocodataset.org/#overview

BTW, there are ships inside the boat category too.

If you want to just select images of a specific COCO category, you might want to do something like this (taken and edited from COCO's official demos):

# display COCO categories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))

# get all images containing given categories (I'm selecting the "bird")
catIds = coco.getCatIds(catNms=['bird']);
imgIds = coco.getImgIds(catIds=catIds);
like image 43
Reine_Ran_ Avatar answered Oct 17 '22 15:10

Reine_Ran_


Nowadays there is a package called fiftyone with which you could download the MS COCO dataset and get the annotations for specific classes only. More information about installation can be found at https://github.com/voxel51/fiftyone#installation.

Once you have the package installed, simply run the following to get say the "person" and "car" classes:

import fiftyone.zoo as foz

# To download the COCO dataset for only the "person" and "car" classes
dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="train",
    label_types=["detections", "segmentations"],
    classes=["person", "car"],
    # max_samples=50,
)

If desired, you can comment out the last option to set a maximum samples size. Moreover, you can change the "train" split to "validation" in order to obtain the validation split instead.

To visualize the dataset downloaded, simply run the following:

# Visualize the dataset in the FiftyOne App
import fiftyone as fo
session = fo.launch_app(dataset)

If you would like to download the splits "train", "validation", and "test" in the same function call of the data to be loaded, you could do the following:

dataset = foz.load_zoo_dataset(
    "coco-2017",
    splits=["train", "validation", "test"],
    label_types=["detections", "segmentations"],
    classes=["person"],
    # max_samples=50,
)
like image 34
Kris Stern Avatar answered Oct 17 '22 13:10

Kris Stern