Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coco Json file to CSV format (path/to/image.jpg,x1,y1,x2,y2,class_name)

I would like to convert my coco JSON file as follows:

The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is:

path/to/image.jpg,x1,y1,x2,y2,class_name

A full example:

*/data/imgs/img_001.jpg,837,346,981,456,cow 
/data/imgs/img_002.jpg,215,312,279,391,cat
/data/imgs/img_002.jpg,22,5,89,84,bird

This defines a dataset with 3 images: img_001.jpg contains a cow, img_002.jpg contains a cat and a bird, and img_003.jpg contains no interesting objects/animals.

How could I do that?

like image 281
jvl Avatar asked Sep 05 '25 02:09

jvl


2 Answers

I have such function.

def convert_coco_json_to_csv(filename):
    import pandas as pd
    import json
    
    # COCO2017/annotations/instances_val2017.json
    s = json.load(open(filename, 'r'))
    out_file = filename[:-5] + '.csv'
    out = open(out_file, 'w')
    out.write('id,x1,y1,x2,y2,label\n')

    all_ids = []
    for im in s['images']:
        all_ids.append(im['id'])

    all_ids_ann = []
    for ann in s['annotations']:
        image_id = ann['image_id']
        all_ids_ann.append(image_id)
        x1 = ann['bbox'][0]
        x2 = ann['bbox'][0] + ann['bbox'][2]
        y1 = ann['bbox'][1]
        y2 = ann['bbox'][1] + ann['bbox'][3]
        label = ann['category_id']
        out.write('{},{},{},{},{},{}\n'.format(image_id, x1, y1, x2, y2, label))

    all_ids = set(all_ids)
    all_ids_ann = set(all_ids_ann)
    no_annotations = list(all_ids - all_ids_ann)
    # Output images without any annotations
    for image_id in no_annotations:
        out.write('{},{},{},{},{},{}\n'.format(image_id, -1, -1, -1, -1, -1))
    out.close()

    # Sort file by image id
    s1 = pd.read_csv(out_file)
    s1.sort_values('id', inplace=True)
    s1.to_csv(out_file, index=False)
like image 60
ZFTurbo Avatar answered Sep 07 '25 16:09

ZFTurbo


Here is a function I use to convert Coco format to AutoML CSV format for image object detection annotated data:

def convert_coco_json_to_csv(filename,bucket):
    import pandas as pd
    import json
    
    s = json.load(open(filename, 'r'))
    out_file = filename[:-5] + '.csv'

    with open(out_file, 'w') as out:
      out.write('GCS_FILE_PATH,label,X_MIN,Y_MIN,,,X_MAX,Y_MAX,,\n')
      file_names = [f"{bucket}/{image['file_name']}" for image in s['images']]
      categories = [cat['name'] for cat in s['categories']]
      for label in s['annotations']:
        #The COCO bounding box format is [top left x position, top left y position, width, height]. 
        # for AutoML: For example, a bounding box for the entire image is expressed as (0.0,0.0,,,1.0,1.0,,), or (0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0).
        HEIGHT = s['images'][label['image_id']]['height']
        WIDTH = s['images'][label['image_id']]['width']
        X_MIN = label['bbox'][0]/WIDTH
        X_MAX = (label['bbox'][0] + label['bbox'][2]) / WIDTH
        Y_MIN = label['bbox'][1] / HEIGHT
        Y_MAX = (label['bbox'][1] + label['bbox'][3]) / HEIGHT
        out.write(f"{file_names[label['image_id']]},{categories[label['category_id']]},{X_MIN},{Y_MIN},,,{X_MAX},{Y_MAX},,\n")


And simply you can use it by calling the function with the file name and the gs storage where images were uploaded:

convert_coco_json_to_csv("/content/train_annotations.coco.json", "gs://[bucket name]")
like image 28
Ahmed Roshdy Avatar answered Sep 07 '25 17:09

Ahmed Roshdy