coco dataset标签数据结构（json文件）-EW帮帮网

COCO数据集现在有3种标注类型：object instances（目标实例）, object keypoints（目标上的关键点）, 和image captions（看图说话），使用json文件存储。

Name	Images	Labels
train link	http://images.cocodataset.org/zips/train2017.zip	http://images.cocodataset.org/annotations/annotations_trainval2017.zip
val link	http://images.cocodataset.org/zips/val2017.zip	http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip

我现在需要做目标检测，所以选取object instances（目标实例），打开的标签名称为：annotations/stuff_annotations_trainval2017中的stuff_train2017.json,整体是一个字典:

{
"info": {"description": null,
         "url": null, 
         "version": null, 
         "year": 2023, 
         "contributor": null, 
         "date_created": "2023-05-27 10:34:38.709025"},
"licenses": [{"url": null, "id": 0, "name": null}],
"images": [{"license": 0,
            "url": null,
            "file_name": "....jpg",
            "height": 800, "width": 800,
            "date_captured": null, "id": 0},
            {...},
            ...,
            {...}]
"type": "instances",
"annotations": [{"id": 0, "image_id": 0, "category_id": 1,
                 "segmentation": [[polygon]], "area": 57142.0, 
                 "bbox": [246.0, 165.0, 310.0, 239.0]([x,y,width,height],即左上角的坐标+宽高), "iscrowd": 0},
                {"id": 1, "image_id": 0, "category_id": 1, 
                 "segmentation": [[polygon]], "area": 59602.0, 
                 "bbox": [248.0, 164.0, 311.0, 238.0], "iscrowd": 0},
                {...},
                ...,
                {...}],
"categories": [{"supercategory": null, "id": 0, "name": "_background_"},
			   {"supercategory": null, "id": 1, "name": "cell"}] 
}

以上是大致的结构，接下来我会逐个拆解：

文件以字典的形式储存，该字典含有五个键值对以囊括所有信息：

{ 
    "info" : info,
    "licenses" : [license],
    "images" : [image],
     "categories" : [category],
    "annotations" : [annataton]
   
}

我们需要的key分别是"images" 和 “categories"和"annotations”

"images"结构
"images"对应的value以嵌套列表的形式存储，其列表嵌套的每一个字典，都对应唯一的一个图片信息。

"images": [                                            
{"id": 0,                                                # int 图像id，可从0开始
 "file_name": "0.jpg",                                   # str 文件名
 "width": 512,                                           # int 图像的宽
 "height": 512,                                          # int 图像的高
 "date_captured": "2020-04-14 01:45:07.508146",          # datatime 获取日期
 "license": 1,                                           # int 遵循哪个协议
 "coco_url": "",                                         # str coco图片链接url
 "flickr_url": ""                                        # str flick图片链接url
}]

"categories"结构
"categories"对应的value以嵌套列表的形式存储，其列表嵌套的每一个字典，都对应唯一的一个图片信息。

"categories":[
{"id": 1,                                 # int 类别id编号
 "name": "rectangle",                     # str 类别名字
 "supercategory": "None"                  # str 类别所属的大类，如卡车和轿车都属于机动车这个class
}]

"annotations"结构
"annotations"对应的value以嵌套列表的形式存储，其列表嵌套的每一个字典，都对应唯一的一个标签信息。
每个字典里有7个键值对。

"annotations": [ 
{
 "id": 0,                                   # int 图片中每个被标记物体的id编号
 "image_id": 0,                             # int 该物体所在图片的编号
 "category_id": 2,                          # int 被标记物体的类别id编号
 "iscrowd": 0,                              # 0 or 1 目标是否被遮盖，默认为0
 "area": 4095.9999999999986,                # float 被检测物体的面积（64 * 64 = 4096)
 "bbox": [200.0, 416.0, 64.0, 64.0],        # [x, y, width, height] 目标检测框的坐标信息
 "segmentation": [[200.0, 416.0, 264.0, 416.0, 264.0, 480.0, 200.0, 480.0]]  
}]
# "bbox"里[x, y, width, height]x, y代表的是物体的左上角的x, y的坐标值。
#"segmentation"里[x1, y1, x2, y2, x3, y3, x4, y4]是以左上角坐标为起始，顺时针依次选取的另外三个坐标点。及[左上x, 左上y, 右上x，右上y，右下x，右下y，左下x，左下y]。

"segmentation"结构

"segmentation":{
 "counts":xxxx
 "size": [426, 640]
   }

coco dataset标签数据结构（json文件）

网站公告

今日签到

热门文章

最新发布