Scroll down for additional information on the dataset/annotation format and licensing…
HOOT Dataset Structure
Folder Structure
HOOT dataset has the following folder structure:
- The dataset is organized by object class and video name.
- The video frames have been extracted in the
.pngformat and named using 0-indexing. - Basic video-level metadata information (e.g. target and motion attributes) can be found in the
meta.infofile under each video folder. - Video annotations are given in the
anno.jsonfile in the video folder. More information on the annotation format can be found below. - Videos in the training split, in the
train.txtfile. - Videos in the test split, in the
test.txtfile. - License information for the dataset can be found in the root HOOT folder, in the
license.txtfile.
HOOT/
├── apple/
│ ├── 001/
│ │ ├── 000000.png
│ │ ├── 000001.png
│ │ ├── ...
│ │ ├── 000949.png
│ │ ├── meta.info
│ │ ├── anno.json
│ ├── 002/
│ ├── ...
│ └── 020/
├── .../
├── zebra/
├── test.txt
├── train.txt
└── license.txt
Annotation Format
HOOT annotations are in the JSON format, and include the following:
- Video key in the
class-video_nameformat, (e.g.apple-001). - Percentage of frames that have occlusion,
frame_occlusion_level. - Median and mean occlusion level of the target across frames, computed using mask IoU with the ground truth target box.
- A list of frame annotations in the form of python dictionaries, each dict includes:
- A
frame_idkey with the frame index as its value. rot_bbandaa_bbkeys for the rotated and axis-aligned bounding boxes for the object. These bounding boxes are in the form of[(x1,y1),(x2,y2),(x3,y3),(x4,y4)], and contain floats for each of the points.- The
maskdictionary contains the occlusion masks for each frame. If a specific occlusion mask does not appear in the frame, the value will be[]. Otherwise, the mask will be given in the RLE format popularized by the COCO annotations. The toolkit will requirepycocotoolsand provides examples for reading the annotations. If there's an occlusion in the frame, any of the following mask types might be annotated:all: the mask for all occluders computed by taking the union of all masks.s: the mask for all solid occluders combined.sp: the mask for all sparse occluders combined.st: the mask for all semi-transparent occluders combined.st: the mask for all transparent occluders combined.
- An
attributesdictionary with the frame-level occlusion tags for absent, full_occlusion, cut_by_frame, partial_obj_occlusion and similar_occluder.
- A
{
"video_key": apple-001,
"frame_occlusion_level": 0.6715,
"median_target_occlusion_level": 0.6080,
"mean_target_occlusion_level": 0.6487,
"frames":
[
...
{
"frame_id": 20,
"aa_bb": [(x1,y1),(x2,y2),(x3,y3),(x4,y4)],
"rot_bb": [(x1,y1),(x2,y2),(x3,y3),(x4,y4)],
"masks":
{
"all": "RLE encoded mask"
"s": "RLE encoded mask"
"sp": []
"st": []
"t": "RLE encoded mask"
},
"attributes":
{
"absent": false
"full_occlusion": false
"cut-by-frame": true
"partial_obj_occlusion": true
"similar_occluder": false
}
},
...
]
}
License Information
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
For inquiries regarding a commercial license to this work please contact the USC Stevens Center for Innovation at licensing@stevens.usc.edu and reference case number 2022-179.